Is there any method to avoid this error? such as adding params or others.
Re: Installing the MLNX driver is failed
Re: ConnectX VPI (MT26418) nic and SFP modules
Hi Kevin,
Thank you for contacting the Mellanox Community.
Based on the ConnectX VPI Product Brief ( http://www.mellanox.com/related-docs/prod_adapter_cards/PB_ConnectX_VPI.pdf ) the card only supports microGiGaCN or QSFP connectors.
You can use the following link to find a validated cable for your ConnectX VPI.
LinkX™ - Mellanox Technologies
Thanks
Cheers,
~Martijn
Re: ibv_reg_mr got file exists error when used nv_peer_mem
Hi Haizhu,
Thank you for contacting the Mellanox Community.
For your test, please install the latest Mellanox OFED version and redo the test with ib_send_bw WITHOUT cuda to check if RDMA is working properly including the option to define the device you want to use.
Example without CUDA
Server:
# ib_send_bw -d mlx5_0 -i 1 -a -F --report_gbits
Client:
# ib_send_bw -d mlx5_0 -i 1 -a -F --report_gbits <ip-address-server>
Example with CUDA
Server:
# ib_send_bw -d mlx5_0 -i 1 -a -F --report_gbits --use_cuda
Client:
# ib_send_bw -d mlx5_0 -i 1 -a -F --report_gbits --use_cuda <ip-address-server>
Also we recommend following the benchmark test from the GPUDirect UM ( http://www.mellanox.com/related-docs/prod_software/Mellanox_GPUDirect_User_Manual_v1.5.pdf ), Section 3.
For further support, we recommend opening a support case with Mellanox Support.
Thanks.
Cheers,
~Martijn
Re: ibv_reg_mr got file exists error when used nv_peer_mem
Hi Martijin
Thank you for your reply about the issue.
I didn't describe the question clearly, the h/w environment is list below:
1. Hardware:
ConnectX-3 (Mellanox Technologies MT27500 Family [ConnectX-3])
Nvidia K80
2. Software:
ubuntu-16.04, kernel 4.8.7
nvidia-driver: nvidia-diag-driver-local-repo-ubuntu1604-384.66_1.0-1_amd64.deb (downsite: NVIDIA DRIVERS Tesla Driver for Ubuntu 16.04 )
cuda-toolkit: cuda_8.0.61_375.26_linux.run (CUDA Toolkit Download | NVIDIA Developer )
MLNX_OFED: MLNX_OFED_SRC-debian-4.1-1.0.2.0.tgz http://www.mellanox.com/downloads/ofed/MLNX_OFED-4.1-1.0.2.0/MLNX_OFED_SRC-debian-4.1-1.0.2.0.tgz
nv_peer_mem: 1.0.5
I have two servers, with one server has a K80 GPU. I want to use perftest to test the RDMA and GPUDirect. Reference to this , I install nv_peer_mem in server with 80 GPU.
When i didn't use --use_cuda, the ib_write_bw work well, but when i use --use_cuda, it hase error, and i print the error message, the ib_write_bw run into ibv_reg_mr, and then got an error: "File has opened". If i didn't insmod nv_peer_mem, ibv_reg_mr got an error: "Bad address".
The background is that i had run the same experiment correct before, which i use kernel 4.4.0, and MLNX_OFED 4.0-2.0.0.1, and didn't install NVMe over Fabrics. Then my workmate install kernel 4.8.7, and NVMe over Fabrics. After then, the ib_write_bw with --use_cuda can never run collect.
Is there any question in my experiment, and experiment environment. And another question, can i use one ConnectX-3 to support NVMe over Fabrics and GPUDirect RDMA at the same time.
Thanks very much for your reply again, and looking forward to your reply.
Yours
Haizhu Shao
Re: Win server 2016 Switch Embedded Teaming (SET) and SR-IOV
Hello Andrzej,
In OS Windows server 2016 with latest windows updates this problem has been fixed.
Please use our latest WinOF-2 drivers from http://www.mellanox.com/page/products_dyn?product_family=32&mtag=windows_sw_drivers .
B.R
Vitaliy
RDS-TOOLS PACKAGE ON MLNX_OFED_LINUX-4.1-1.0.2.0-fc24-x86_64.iso ?
Hello
I have installed Mellanox OFED for Linux Sofware version 4.1-1.0.2.0 (MLNX_OFED_LINUX-4.1-1.0.2.0-fc24-x86_64.iso) on Fedora 24. But I do not find rds-tools package. Can You tell me wher I can find the rds-tools package for OFED Linux Sofware version 4.1-1.0.2.0 ? I 'm looking for this package to use rds-ping and rds-stress tools .
Here is the content of my RPMS directory ->
root@aigle CDROM]# cd RPMS
[root@aigle RPMS]# ll *rds*
ls: cannot access '*rds*': No such file or directory
[root@aigle RPMS]# rpm -qa *.rpm | grep rds
[root@aigle RPMS]#
[root@aigle RPMS]# ll
total 119846
-r--r--r--. 1 abdel abdel 151642 Jun 27 18:05 ar_mgr-1.0-0.34.g9bd7c9a.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 63946 Jun 27 18:04 cc_mgr-1.0-0.33.g9bd7c9a.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 272414 Jun 27 17:56 dapl-2.1.10mlnx-OFED.3.4.2.1.0.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 46998 Jun 27 17:56 dapl-devel-2.1.10mlnx-OFED.3.4.2.1.0.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 151326 Jun 27 17:56 dapl-devel-static-2.1.10mlnx-OFED.3.4.2.1.0.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 138054 Jun 27 17:56 dapl-utils-2.1.10mlnx-OFED.3.4.2.1.0.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 13098 Jun 27 18:04 dump_pr-1.0-0.29.g9bd7c9a.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 4678882 Jun 27 18:15 hcoll-3.8.1649-1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 84054 Jun 27 17:52 ibacm-41mlnx1-OFED.4.1.0.1.0.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 9918 Jun 27 17:52 ibacm-devel-41mlnx1-OFED.4.1.0.1.0.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 1207010 Jun 27 17:42 ibdump-5.0.0-1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 70838 Jun 27 17:52 ibsim-0.6mlnx1-0.8.g9d76581.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 1801990 Jun 27 18:03 ibutils2-2.1.1-0.91.MLNX20170612.g2e0d52a.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 364950 Jun 27 18:05 infiniband-diags-1.6.7.MLNX20170511.7595646-0.1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 34762 Jun 27 18:05 infiniband-diags-compat-1.6.7.MLNX20170511.7595646-0.1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 27702 Jun 27 18:05 infiniband-diags-guest-1.6.7.MLNX20170511.7595646-0.1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 29574 Jun 27 17:50 iser-4.0-OFED.4.1.1.0.2.1.gc22af88.kver.4.5.5_300.fc24.x86_64.x86_64.rpm
-r--r--r--. 1 abdel abdel 13454 Jun 27 17:50 kernel-mft-4.7.0-41.kver.4.5.5_300.fc24.x86_64.x86_64.rpm
-r--r--r--. 1 abdel abdel 69162 Jun 27 17:49 knem-1.1.2.90mlnx2-OFED.4.0.1.6.3.1.g4faa297.kver.4.5.5_300.fc24.x86_64.x86_64.rpm
-r--r--r--. 1 abdel abdel 22730 Jun 27 17:52 libibcm-41mlnx1-OFED.4.1.0.1.0.41102.i686.rpm
-r--r--r--. 1 abdel abdel 22042 Jun 27 17:52 libibcm-41mlnx1-OFED.4.1.0.1.0.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 17662 Jun 27 17:52 libibcm-devel-41mlnx1-OFED.4.1.0.1.0.41102.i686.rpm
-r--r--r--. 1 abdel abdel 17502 Jun 27 17:52 libibcm-devel-41mlnx1-OFED.4.1.0.1.0.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 69438 Jun 27 17:52 libibmad-1.3.13.MLNX20170511.267a441-0.1.41102.i686.rpm
-r--r--r--. 1 abdel abdel 68510 Jun 27 17:52 libibmad-1.3.13.MLNX20170511.267a441-0.1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 17262 Jun 27 17:52 libibmad-devel-1.3.13.MLNX20170511.267a441-0.1.41102.i686.rpm
-r--r--r--. 1 abdel abdel 17238 Jun 27 17:52 libibmad-devel-1.3.13.MLNX20170511.267a441-0.1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 36510 Jun 27 17:52 libibmad-static-1.3.13.MLNX20170511.267a441-0.1.41102.i686.rpm
-r--r--r--. 1 abdel abdel 38106 Jun 27 17:52 libibmad-static-1.3.13.MLNX20170511.267a441-0.1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 510193 Jun 27 18:25 libibprof-1.1.41-1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 68994 Jun 27 17:52 libibumad-13.10.2.MLNX20170511.dcc9f7a-0.1.41102.i686.rpm
-r--r--r--. 1 abdel abdel 68658 Jun 27 17:52 libibumad-13.10.2.MLNX20170511.dcc9f7a-0.1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 12830 Jun 27 17:52 libibumad-devel-13.10.2.MLNX20170511.dcc9f7a-0.1.41102.i686.rpm
-r--r--r--. 1 abdel abdel 12814 Jun 27 17:52 libibumad-devel-13.10.2.MLNX20170511.dcc9f7a-0.1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 20014 Jun 27 17:52 libibumad-static-13.10.2.MLNX20170511.dcc9f7a-0.1.41102.i686.rpm
-r--r--r--. 1 abdel abdel 20882 Jun 27 17:52 libibumad-static-13.10.2.MLNX20170511.dcc9f7a-0.1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 91114 Jun 27 17:39 libibverbs-41mlnx1-OFED.4.1.0.1.1.41102.i686.rpm
-r--r--r--. 1 abdel abdel 87870 Jun 27 17:38 libibverbs-41mlnx1-OFED.4.1.0.1.1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 175950 Jun 27 17:39 libibverbs-devel-41mlnx1-OFED.4.1.0.1.1.41102.i686.rpm
-r--r--r--. 1 abdel abdel 175918 Jun 27 17:38 libibverbs-devel-41mlnx1-OFED.4.1.0.1.1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 51542 Jun 27 17:39 libibverbs-devel-static-41mlnx1-OFED.4.1.0.1.1.41102.i686.rpm
-r--r--r--. 1 abdel abdel 51414 Jun 27 17:38 libibverbs-devel-static-41mlnx1-OFED.4.1.0.1.1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 122938 Jun 27 17:39 libibverbs-utils-41mlnx1-OFED.4.1.0.1.1.41102.i686.rpm
-r--r--r--. 1 abdel abdel 124446 Jun 27 17:38 libibverbs-utils-41mlnx1-OFED.4.1.0.1.1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 68870 Jun 27 17:50 libmlx4-41mlnx1-OFED.4.1.0.1.0.41102.i686.rpm
-r--r--r--. 1 abdel abdel 66642 Jun 27 17:50 libmlx4-41mlnx1-OFED.4.1.0.1.0.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 58786 Jun 27 17:50 libmlx4-devel-41mlnx1-OFED.4.1.0.1.0.41102.i686.rpm
-r--r--r--. 1 abdel abdel 60682 Jun 27 17:50 libmlx4-devel-41mlnx1-OFED.4.1.0.1.0.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 130934 Jun 27 17:51 libmlx5-41mlnx1-OFED.4.1.0.1.5.41102.i686.rpm
-r--r--r--. 1 abdel abdel 128794 Jun 27 17:51 libmlx5-41mlnx1-OFED.4.1.0.1.5.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 129798 Jun 27 17:51 libmlx5-devel-41mlnx1-OFED.4.1.0.1.5.41102.i686.rpm
-r--r--r--. 1 abdel abdel 134494 Jun 27 17:51 libmlx5-devel-41mlnx1-OFED.4.1.0.1.5.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 70494 Jun 27 17:53 librdmacm-41mlnx1-OFED.4.1.0.1.0.41102.i686.rpm
-r--r--r--. 1 abdel abdel 72106 Jun 27 17:53 librdmacm-41mlnx1-OFED.4.1.0.1.0.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 119514 Jun 27 17:53 librdmacm-devel-41mlnx1-OFED.4.1.0.1.0.41102.i686.rpm
-r--r--r--. 1 abdel abdel 127738 Jun 27 17:53 librdmacm-devel-41mlnx1-OFED.4.1.0.1.0.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 84530 Jun 27 17:53 librdmacm-utils-41mlnx1-OFED.4.1.0.1.0.41102.i686.rpm
-r--r--r--. 1 abdel abdel 83722 Jun 27 17:53 librdmacm-utils-41mlnx1-OFED.4.1.0.1.0.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 24714 Jun 27 17:51 librxe-41mlnx1-OFED.4.1.0.1.7.41102.i686.rpm
-r--r--r--. 1 abdel abdel 23854 Jun 27 17:51 librxe-41mlnx1-OFED.4.1.0.1.7.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 12318 Jun 27 17:51 librxe-devel-static-41mlnx1-OFED.4.1.0.1.7.41102.i686.rpm
-r--r--r--. 1 abdel abdel 11982 Jun 27 17:51 librxe-devel-static-41mlnx1-OFED.4.1.0.1.7.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 706362 Jun 27 18:29 libvma-8.3.7-1.x86_64.rpm
-r--r--r--. 1 abdel abdel 15254 Jun 27 18:29 libvma-devel-8.3.7-1.x86_64.rpm
-r--r--r--. 1 abdel abdel 37686 Jun 27 18:29 libvma-utils-8.3.7-1.x86_64.rpm
-r--r--r--. 1 abdel abdel 65966777 Jun 21 09:18 mft-4.7.0-41.x86_64.rpm
-r--r--r--. 1 abdel abdel 114746 Jun 27 18:34 mlnx-ethtool-4.2-1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 10578239 Jun 27 21:34 mlnx-fw-updater-4.1-1.0.2.0.x86_64.rpm
-r--r--r--. 1 abdel abdel 100742 Jun 27 17:49 mlnx-ofa_kernel-4.1-OFED.4.1.1.0.2.1.gc22af88.fc24.x86_64.rpm
-r--r--r--. 1 abdel abdel 4801566 Jun 27 17:49 mlnx-ofa_kernel-devel-4.1-OFED.4.1.1.0.2.1.gc22af88.fc24.x86_64.rpm
-r--r--r--. 1 abdel abdel 948986 Jun 27 17:49 mlnx-ofa_kernel-modules-4.1-OFED.4.1.1.0.2.1.gc22af88.kver.4.5.5_300.fc24.x86_64.x86_64.rpm
-r--r--r--. 1 abdel abdel 6693 Jun 27 21:33 mlnx-ofed-all-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm
-r--r--r--. 1 abdel abdel 4835 Jun 27 21:33 mlnx-ofed-basic-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm
-r--r--r--. 1 abdel abdel 55356 Jun 27 21:33 mlnxofed-docs-4.1-1.0.2.0.noarch.rpm
-r--r--r--. 1 abdel abdel 4869 Jun 27 21:33 mlnx-ofed-dpdk-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm
-r--r--r--. 1 abdel abdel 6135 Jun 27 21:34 mlnx-ofed-guest-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm
-r--r--r--. 1 abdel abdel 6226 Jun 27 21:34 mlnx-ofed-hpc-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm
-r--r--r--. 1 abdel abdel 6263 Jun 27 21:34 mlnx-ofed-hypervisor-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm
-r--r--r--. 1 abdel abdel 4537 Jun 27 21:34 mlnx-ofed-kernel-only-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm
-r--r--r--. 1 abdel abdel 6541 Jun 27 21:34 mlnx-ofed-vma-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm
-r--r--r--. 1 abdel abdel 6245 Jun 27 21:34 mlnx-ofed-vma-eth-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm
-r--r--r--. 1 abdel abdel 6577 Jun 27 21:34 mlnx-ofed-vma-vpi-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm
-r--r--r--. 1 abdel abdel 28750 Jun 27 18:10 mpi-selector-1.0.3-1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 448546 Jun 27 18:35 mpitests_openmpi-3.2.19-acade41.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 991474 Jun 27 17:40 mstflint-4.7.0-1.6.g26037b7.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 3782129 Jun 27 18:07 mxm-3.6.3102-1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 45270 Jun 27 17:38 ofed-scripts-4.1-OFED.4.1.1.0.2.x86_64.rpm
-r--r--r--. 1 abdel abdel 13966862 Jun 27 18:25 openmpi-2.1.2a1-1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 818378 Jun 27 17:55 opensm-4.9.0.MLNX20170607.280b8f7-0.1.41102.i686.rpm
-r--r--r--. 1 abdel abdel 824930 Jun 27 17:54 opensm-4.9.0.MLNX20170607.280b8f7-0.1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 228774 Jun 27 17:55 opensm-devel-4.9.0.MLNX20170607.280b8f7-0.1.41102.i686.rpm
-r--r--r--. 1 abdel abdel 228778 Jun 27 17:54 opensm-devel-4.9.0.MLNX20170607.280b8f7-0.1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 76098 Jun 27 17:55 opensm-libs-4.9.0.MLNX20170607.280b8f7-0.1.41102.i686.rpm
-r--r--r--. 1 abdel abdel 71238 Jun 27 17:54 opensm-libs-4.9.0.MLNX20170607.280b8f7-0.1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 68354 Jun 27 17:55 opensm-static-4.9.0.MLNX20170607.280b8f7-0.1.41102.i686.rpm
-r--r--r--. 1 abdel abdel 67522 Jun 27 17:54 opensm-static-4.9.0.MLNX20170607.280b8f7-0.1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 254870 Jun 27 17:57 perftest-4.1-0.4.g16dbf63.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 63946 Jun 27 18:05 qperf-0.4.9-9.41102.x86_64.rpm
dr-xr-xr-x. 2 abdel abdel 2048 Jun 27 21:34 repodata
-r--r--r--. 1 abdel abdel 1674339 Jun 27 18:10 sharp-1.3.1.MLNX20170625.859dc24-1.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 857178 Jun 27 18:33 sockperf-3.1-14.gita9f6056282ef.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 40270 Jun 27 17:50 srp-4.0-OFED.4.1.1.0.2.1.gc22af88.kver.4.5.5_300.fc24.x86_64.x86_64.rpm
-r--r--r--. 1 abdel abdel 51066 Jun 27 17:57 srptools-41mlnx1-4.41102.x86_64.rpm
-r--r--r--. 1 abdel abdel 2156449 Jun 27 18:08 ucx-1.2.2947-1.41102.x86_64.rpm
thanks
Re: ConnectX-3 Pro connecting at 10g instead of 40g
The cable was the issue. I got a cable from the list provided, did the loop test and the speed went up to 40g. It also increased the speed to 40g between the Win and Vmware servers.
This is the cable I used: link
Thank you.
How to test RDMA traffic congestion
Hi. We're trying to debug issues we see periodically with Lustre Networking on top of CX-3 and CX-4 based RoCE(v1) fabrics using SR-IOV for connections from Lustre clients running as KVM guests (servers are bare-metal). When we hit these errors we see drop/error counters going up on the hosts.
So far all simple ib tests between host-pairs look ok, now we want to test congestion scenarios, e.g., 2 hosts sending to 1 host. However we've discovered that whilst e.g. ib_write_bw has an option to specify more than one QP, it actually doesn't support it! Is there a simple way to engineer such a test or are we going to have to write something or move to an MPI based test suite...?
Question about ESXi 6.5 iSER driver with PFC port configuraion.
Hi!
I saw a Mellanox iSER driver for ESXi 6.5 released.
But this driver seem to be need Global Pause configuration on each ports that show on driver's manual.
Is there any solution with PFC configuration on switch's ports?
Best Regard,
P.S
Why iser storage adapter disappeared after every ESXi 6.5 host reboot?
P.S 02
This Global Pause based iSER initiator can't connect SCST Ethernet iSER Target.
How can I resolve this problem?
Tested iSER Target
01. LIO
02. SCST
03. StarWind vSAN
LAG problems
My team is currently standing up a new cluster that has an SN2700 core ethernet switch on our boot network. LAG links are working fine between this core and the leaf switches in the new cluster. We also have an older cluster with an SX1036 ethernet switch serving as its core switch. LAG links are also working fine between this older core switch and the older leaf switches in that cluster. Several of us have tried to get LAG working between the SX1036 and SN2700 and we can't working link (single link works fine). We've done typical troubleshooting looking for bad cables/ports etc. We can find no differences comparing the configurations and status for working LAG links and the failing link.
The SX1036 is a PPC switch and is running a much older firmware:
Product name: MLNX-OS
Product release: 3.4.3002
Build ID: #1-dev
Build date: 2015-07-30 20:13:15
Target arch: ppc
Target hw: m460ex
Built by: jenkins@fit74
Version summary: PPC_M460EX 3.4.3002 2015-07-30 20:13:15 ppc
Product model: ppc
than the SN2700 (X86):
Product name: MLNX-OS
Product release: 3.6.3200
Build ID: #1-dev
Build date: 2017-03-09 17:55:58
Target arch: x86_64
Target hw: x86_64
Built by: jenkins@e3f42965d5ee
Version summary: X86_64 3.6.3200 2017-03-09 17:55:58 x86_64
Product model: x86onie
The obvious thing to try is updating the firmware on the SX1036, but this cluster is in production and our team is nervous about messing with that core switch as it's pretty critical to our infrastructure. Would a firmware mismatch cause this behavior.
I have seen documentation indicating that MLAG doesn't work between PPC and X86 switches. I sure hope that's not the case for LAG...
iSER driver for ESXi 6.5 - Does it support ConnectX-3 or not
Issues with setting up Storage Spaces Direct
Hello everyone,
I am working on setting up a S2D Cluster and have ran into an issue where I am unable to get my nodes to communicate via RDMA I have used the Test-RDMA.ps1 script and DISKSPD provided in another post in the Mellanox Community.
Here is my hardware configuration:
Configuration 4 Nodes with Following configurations on each
Hardware: Intel R2224WTTYSR Server Systems
256GB Samsung DDR4 LRDIMMs
2x Intel E5-2620 v4 Xeon CPU
1x Mellanox ConnectX4 - MCX414A-BCAT
1x Broadcom LSI 3805-24i HBA
2x Intel DC P3700 800GB for Journal\cache drives
4x Seagate 2TB SAS HDs for Capacity drives
Networking 1x Netgear 10GbE network switch for VMs
2x Mellanox SX1012 12 Port QSFP28 Switch for RDMA\cluster Traffic
8x MC2210128-003 Mellanox LinkX Cables
We are not utilizing SET Teams and are only using the ConnectX4 NICs for RoCE traffic for the storage traffic.
All nodes are setup with this configuration for the RDMA enabled NICs:
Attached is the configuration of my Mellanox 1012X Switch.
Any help in the right direction is very appreciated.
Thanks!
ceph + rdma error: ibv_open_device failed
I followed this doc:
Bring Up Ceph RDMA - Developer's Guide
But mon could not start with this error:
7f5acb890700 -1 Infiniband Device open rdma device failed. (2) No such file or directory
I checked ceph code:
116 name = ibv_get_device_name(device);
117 ctxt = ibv_open_device(device);
118 if (ctxt == NULL) {
119 lderr(cct) << __func__ << " open rdm a device failed. "<< cpp_strerror(errno) << dendl;
120 ceph_abort();
121 }
Then
gdb info:
Breakpoint 1, Device::Device (this=0x55555f3597e0, cct=0x55555eec01c0, d=<optimized out>)
at /usr/src/debug/ceph-12.2.0/src/msg/async/rdma/Infiniband.cc:116
116 name = ibv_get_device_name(device);
$7 = {ops = {alloc_context = 0x0, free_context = 0x0}, node_type = IBV_NODE_CA, transport_type = IBV_TRANSPORT_IB, -------------------------------------------*(struct ibv_device *) device
name = "mlx4_0", '\000' <repeats 57 times>, dev_name = "uverbs0", '\000' <repeats 56 times>,
dev_path = "/sys/class/infiniband_verbs/uverbs0", '\000' <repeats 220 times>,
ibdev_path = "/sys/class/infiniband/mlx4_0", '\000' <repeats 227 times>}
Breakpoint 2, Device::Device (this=0x55555f3597e0, cct=0x55555eec01c0, d=<optimized out>)
at /usr/src/debug/ceph-12.2.0/src/msg/async/rdma/Infiniband.cc:117
117 ctxt = ibv_open_device(device);
Cannot access memory at address 0x646f6e2f305f3478 -------------------------------------------*(struct CephContext *) ctxt
It seems that ibv_open_device failed
# ibstat
CA 'mlx4_0'
CA type: MT26428
Number of ports: 1
Firmware version: 2.9.1000
Hardware version: b0
Node GUID: 0x0002c90300589efc
System image GUID: 0x0002c90300589eff
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 33
LMC: 0
SM lid: 23
Capability mask: 0x0251086a
Port GUID: 0x0002c90300589efd
Link layer: InfiniBand
Is there any problem with the data of struct ibv_device?
Re: iSER driver for ESXi 6.5 - Does it support ConnectX-3 or not
Here are some links that will guide you through the proper installation & implimentation of iSER v1.0.0.1 along with EXSi 6.5 ConnectX-3/Pro Inbox Driver v3.16.0.0
- Inbox Drive v3.16.0.0 supports only CX-3/pro adapter
- the iSER driver supports all CX-3, CX-4 & CX-5 adapters
- iSER driver:
- Release-Note: http://www.mellanox.com/related-docs/prod_software/Mellanox_MLNX-NATIVE-ESX-iSER_Driver_for_VMware_ESXi_6.5_Release_Notes_v1.0.0.1.pdf
- Instructions on how to install are in the Quick start guide: http://www.mellanox.com/related-docs/prod_software/Mellanox_MLNX-NATIVE-ESX-iSER_Driver_for_VMware_ESXi_6.5_Quick_Start_Guide_v1.0.0.1.pdf
- as for Inbox driver: It is part of the esxi6.5 OS and can be revealed by running: # esxcli software vib list
Hope this helps
Re: iSER driver for ESXi 6.5 - Does it support ConnectX-3 or not
Your guide show me a old ESXi 6.0 interface figure and old Global Pause switch configuration.
Do you have real world guide for RoCE v1 that on iSER Quick Guide on your site?
Best Regard,
Jae-Hoon Choi
How do I disable FEC for MCX416A-CCAT on windows
I have a MCX416A-CCAT I already set the seed to 100Gb and disable auto- negotiation I was wondering if is there a way to disable FEC.
Re: ConnectX-3 WinOF 5.35 on Win2016 Multiple Partitions
I am not sure if this applies, but Linux has a max IPoB MTU of 2048(2044) while Windows has max 4096(4092)
When connecting the 2 different OS you must use the smaller MTU 2044. This normally must be configured manually on the Windows Servers. Failure to set a common MTU in the past yielded poor performance or connectivity issues.
It has been a while since we used both together so things may have changed.
Re: Issues with setting up Storage Spaces Direct
We are using S2D with IB on ConnectX 3 cards. No problems.
I am not familiar with the Test-RDMA script.
How are you testing RDMA? I use Windows Performance Monitor. There are RDMA specific counters which make it easy to track RDMA traffic. Other than setup and drivers there was not much else to do.
Have you installed the latest drivers for you cards?
IB Switch IS5035 MTU Setting?
I am wondering if someone can help with this.
We have 2x Mellanox IS5035 Switches running SM.
I have set the MTU on all of our Mellanox ConnectX3 Adapters on our Windows 2012 and 2016 Servers to 4092.
We get the following error in the Windows Server 2016 Event log
"According to the configuration under the "Jumbo Packets" advanced property, the MTU configured for device Mellanox ConnectX-3 IPoIB Adapter is 4092. The effective MTU is the supplied value + 4 bytes (for the IPoIB header). This configuration exceeds the MTU reported by OpenSM, which is 2048. This inconsistency may result in communication failures. Please change the MTU of IPoIB or OpenSM, and restart the driver."
In the IS5035 under Fabric MGMT > Partitions the Default MTU is set to Default.
Is this the correct place to change the SM MTU?
Will it take effect right away?
Will there be any interruption in traffic? These are production servers so I need to know the impact
Thanks!
Todd
Re: RDS-TOOLS PACKAGE ON MLNX_OFED_LINUX-4.1-1.0.2.0-fc24-x86_64.iso ?
Hi Minouche S.A.
Thank you for contacting the Mellanox Community and posting your question.
Unfortunately, as mentioned in the Mellanox OFED Driver Release Notes ( http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_Release_Notes_4_1-1_0_2_0.pdf ), Section 2.2 "Unsupported Functionalities/Features/HCAs", Mellanox does not support RDS. Therefor no RDS module and / or RDS related tools are supplied with the latest Mellanox OFED driver. Some older driver versions may contain a version of the RDS module but still Mellanox does not support the use of it.
For support and use of RDS, the OS vendors INBOX driver and tools should be used.
Thanks.
Cheers,
~Martijn