Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6227 articles
Browse latest View live

Re: Installing the MLNX driver is failed

$
0
0

Is there any method to avoid this error? such as adding params or others.


Re: ConnectX VPI (MT26418) nic and SFP modules

Re: ibv_reg_mr got file exists error when used nv_peer_mem

$
0
0

Hi Haizhu,

 

Thank you for contacting the Mellanox Community.

 

For your test, please install the latest Mellanox OFED version and redo the test with ib_send_bw WITHOUT cuda to check if RDMA is working properly including the option to define the device you want to use.

Example without CUDA

Server:

# ib_send_bw -d mlx5_0 -i 1 -a -F --report_gbits

Client:

# ib_send_bw -d mlx5_0 -i 1 -a -F --report_gbits <ip-address-server>

 

 

Example with CUDA

Server:

# ib_send_bw -d mlx5_0 -i 1 -a -F --report_gbits --use_cuda

Client:

# ib_send_bw -d mlx5_0 -i 1 -a -F --report_gbits --use_cuda <ip-address-server>

 

Also we recommend following the benchmark test from the GPUDirect UM ( http://www.mellanox.com/related-docs/prod_software/Mellanox_GPUDirect_User_Manual_v1.5.pdf ), Section 3.

 

For further support, we recommend opening a support case with Mellanox Support.

 

Thanks.

 

Cheers,

~Martijn

Re: ibv_reg_mr got file exists error when used nv_peer_mem

$
0
0

Hi Martijin

Thank you for your reply about the issue.

 

I didn't describe the question clearly, the h/w environment is list below:

1. Hardware:

ConnectX-3 (Mellanox Technologies MT27500 Family [ConnectX-3])

Nvidia K80

2. Software:

ubuntu-16.04, kernel 4.8.7

nvidia-driver: nvidia-diag-driver-local-repo-ubuntu1604-384.66_1.0-1_amd64.deb (downsite: NVIDIA DRIVERS Tesla Driver for Ubuntu 16.04 )

cuda-toolkit: cuda_8.0.61_375.26_linux.run (CUDA Toolkit Download | NVIDIA Developer )

MLNX_OFED: MLNX_OFED_SRC-debian-4.1-1.0.2.0.tgz  http://www.mellanox.com/downloads/ofed/MLNX_OFED-4.1-1.0.2.0/MLNX_OFED_SRC-debian-4.1-1.0.2.0.tgz

nv_peer_mem: 1.0.5

 

I have two servers, with one server has a K80 GPU. I want to use perftest to test the RDMA and GPUDirect. Reference to this , I install nv_peer_mem in server with 80 GPU.

When i didn't use --use_cuda, the ib_write_bw work well, but when i use --use_cuda, it hase error, and i print the error message, the ib_write_bw run into ibv_reg_mr, and then got an error: "File has opened". If i didn't insmod nv_peer_mem, ibv_reg_mr got an error: "Bad address".

 

The background is that i had run the same experiment correct before, which i use kernel 4.4.0, and MLNX_OFED 4.0-2.0.0.1, and didn't install NVMe over Fabrics. Then my workmate install kernel 4.8.7, and NVMe over Fabrics. After then, the ib_write_bw with --use_cuda can never run collect.

 

Is there any question in my experiment, and experiment environment. And another question, can i use one ConnectX-3 to support NVMe over Fabrics and GPUDirect RDMA at the same time.

 

 

 

Thanks very much for your reply again, and looking forward to your reply.

 

Yours

Haizhu Shao

 

Re: Win server 2016 Switch Embedded Teaming (SET) and SR-IOV

RDS-TOOLS PACKAGE ON MLNX_OFED_LINUX-4.1-1.0.2.0-fc24-x86_64.iso ?

$
0
0

Hello

 

I have installed Mellanox OFED for Linux Sofware version 4.1-1.0.2.0 (MLNX_OFED_LINUX-4.1-1.0.2.0-fc24-x86_64.iso) on Fedora 24. But I do not find rds-tools package. Can You tell me wher I can find the rds-tools package for OFED  Linux Sofware version 4.1-1.0.2.0 ? I 'm  looking for this package to use rds-ping and rds-stress tools .

 

Here is the content of my RPMS directory ->

 

root@aigle CDROM]# cd RPMS

 

[root@aigle RPMS]# ll *rds*

ls: cannot access '*rds*': No such file or directory

 

[root@aigle RPMS]# rpm -qa *.rpm | grep rds

[root@aigle RPMS]#

 

[root@aigle RPMS]# ll

total 119846

-r--r--r--. 1 abdel abdel   151642 Jun 27 18:05 ar_mgr-1.0-0.34.g9bd7c9a.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    63946 Jun 27 18:04 cc_mgr-1.0-0.33.g9bd7c9a.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel   272414 Jun 27 17:56 dapl-2.1.10mlnx-OFED.3.4.2.1.0.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    46998 Jun 27 17:56 dapl-devel-2.1.10mlnx-OFED.3.4.2.1.0.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel   151326 Jun 27 17:56 dapl-devel-static-2.1.10mlnx-OFED.3.4.2.1.0.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel   138054 Jun 27 17:56 dapl-utils-2.1.10mlnx-OFED.3.4.2.1.0.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    13098 Jun 27 18:04 dump_pr-1.0-0.29.g9bd7c9a.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel  4678882 Jun 27 18:15 hcoll-3.8.1649-1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    84054 Jun 27 17:52 ibacm-41mlnx1-OFED.4.1.0.1.0.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel     9918 Jun 27 17:52 ibacm-devel-41mlnx1-OFED.4.1.0.1.0.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel  1207010 Jun 27 17:42 ibdump-5.0.0-1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    70838 Jun 27 17:52 ibsim-0.6mlnx1-0.8.g9d76581.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel  1801990 Jun 27 18:03 ibutils2-2.1.1-0.91.MLNX20170612.g2e0d52a.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel   364950 Jun 27 18:05 infiniband-diags-1.6.7.MLNX20170511.7595646-0.1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    34762 Jun 27 18:05 infiniband-diags-compat-1.6.7.MLNX20170511.7595646-0.1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    27702 Jun 27 18:05 infiniband-diags-guest-1.6.7.MLNX20170511.7595646-0.1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    29574 Jun 27 17:50 iser-4.0-OFED.4.1.1.0.2.1.gc22af88.kver.4.5.5_300.fc24.x86_64.x86_64.rpm

-r--r--r--. 1 abdel abdel    13454 Jun 27 17:50 kernel-mft-4.7.0-41.kver.4.5.5_300.fc24.x86_64.x86_64.rpm

-r--r--r--. 1 abdel abdel    69162 Jun 27 17:49 knem-1.1.2.90mlnx2-OFED.4.0.1.6.3.1.g4faa297.kver.4.5.5_300.fc24.x86_64.x86_64.rpm

-r--r--r--. 1 abdel abdel    22730 Jun 27 17:52 libibcm-41mlnx1-OFED.4.1.0.1.0.41102.i686.rpm

-r--r--r--. 1 abdel abdel    22042 Jun 27 17:52 libibcm-41mlnx1-OFED.4.1.0.1.0.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    17662 Jun 27 17:52 libibcm-devel-41mlnx1-OFED.4.1.0.1.0.41102.i686.rpm

-r--r--r--. 1 abdel abdel    17502 Jun 27 17:52 libibcm-devel-41mlnx1-OFED.4.1.0.1.0.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    69438 Jun 27 17:52 libibmad-1.3.13.MLNX20170511.267a441-0.1.41102.i686.rpm

-r--r--r--. 1 abdel abdel    68510 Jun 27 17:52 libibmad-1.3.13.MLNX20170511.267a441-0.1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    17262 Jun 27 17:52 libibmad-devel-1.3.13.MLNX20170511.267a441-0.1.41102.i686.rpm

-r--r--r--. 1 abdel abdel    17238 Jun 27 17:52 libibmad-devel-1.3.13.MLNX20170511.267a441-0.1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    36510 Jun 27 17:52 libibmad-static-1.3.13.MLNX20170511.267a441-0.1.41102.i686.rpm

-r--r--r--. 1 abdel abdel    38106 Jun 27 17:52 libibmad-static-1.3.13.MLNX20170511.267a441-0.1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel   510193 Jun 27 18:25 libibprof-1.1.41-1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    68994 Jun 27 17:52 libibumad-13.10.2.MLNX20170511.dcc9f7a-0.1.41102.i686.rpm

-r--r--r--. 1 abdel abdel    68658 Jun 27 17:52 libibumad-13.10.2.MLNX20170511.dcc9f7a-0.1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    12830 Jun 27 17:52 libibumad-devel-13.10.2.MLNX20170511.dcc9f7a-0.1.41102.i686.rpm

-r--r--r--. 1 abdel abdel    12814 Jun 27 17:52 libibumad-devel-13.10.2.MLNX20170511.dcc9f7a-0.1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    20014 Jun 27 17:52 libibumad-static-13.10.2.MLNX20170511.dcc9f7a-0.1.41102.i686.rpm

-r--r--r--. 1 abdel abdel    20882 Jun 27 17:52 libibumad-static-13.10.2.MLNX20170511.dcc9f7a-0.1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    91114 Jun 27 17:39 libibverbs-41mlnx1-OFED.4.1.0.1.1.41102.i686.rpm

-r--r--r--. 1 abdel abdel    87870 Jun 27 17:38 libibverbs-41mlnx1-OFED.4.1.0.1.1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel   175950 Jun 27 17:39 libibverbs-devel-41mlnx1-OFED.4.1.0.1.1.41102.i686.rpm

-r--r--r--. 1 abdel abdel   175918 Jun 27 17:38 libibverbs-devel-41mlnx1-OFED.4.1.0.1.1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    51542 Jun 27 17:39 libibverbs-devel-static-41mlnx1-OFED.4.1.0.1.1.41102.i686.rpm

-r--r--r--. 1 abdel abdel    51414 Jun 27 17:38 libibverbs-devel-static-41mlnx1-OFED.4.1.0.1.1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel   122938 Jun 27 17:39 libibverbs-utils-41mlnx1-OFED.4.1.0.1.1.41102.i686.rpm

-r--r--r--. 1 abdel abdel   124446 Jun 27 17:38 libibverbs-utils-41mlnx1-OFED.4.1.0.1.1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    68870 Jun 27 17:50 libmlx4-41mlnx1-OFED.4.1.0.1.0.41102.i686.rpm

-r--r--r--. 1 abdel abdel    66642 Jun 27 17:50 libmlx4-41mlnx1-OFED.4.1.0.1.0.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    58786 Jun 27 17:50 libmlx4-devel-41mlnx1-OFED.4.1.0.1.0.41102.i686.rpm

-r--r--r--. 1 abdel abdel    60682 Jun 27 17:50 libmlx4-devel-41mlnx1-OFED.4.1.0.1.0.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel   130934 Jun 27 17:51 libmlx5-41mlnx1-OFED.4.1.0.1.5.41102.i686.rpm

-r--r--r--. 1 abdel abdel   128794 Jun 27 17:51 libmlx5-41mlnx1-OFED.4.1.0.1.5.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel   129798 Jun 27 17:51 libmlx5-devel-41mlnx1-OFED.4.1.0.1.5.41102.i686.rpm

-r--r--r--. 1 abdel abdel   134494 Jun 27 17:51 libmlx5-devel-41mlnx1-OFED.4.1.0.1.5.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    70494 Jun 27 17:53 librdmacm-41mlnx1-OFED.4.1.0.1.0.41102.i686.rpm

-r--r--r--. 1 abdel abdel    72106 Jun 27 17:53 librdmacm-41mlnx1-OFED.4.1.0.1.0.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel   119514 Jun 27 17:53 librdmacm-devel-41mlnx1-OFED.4.1.0.1.0.41102.i686.rpm

-r--r--r--. 1 abdel abdel   127738 Jun 27 17:53 librdmacm-devel-41mlnx1-OFED.4.1.0.1.0.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    84530 Jun 27 17:53 librdmacm-utils-41mlnx1-OFED.4.1.0.1.0.41102.i686.rpm

-r--r--r--. 1 abdel abdel    83722 Jun 27 17:53 librdmacm-utils-41mlnx1-OFED.4.1.0.1.0.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    24714 Jun 27 17:51 librxe-41mlnx1-OFED.4.1.0.1.7.41102.i686.rpm

-r--r--r--. 1 abdel abdel    23854 Jun 27 17:51 librxe-41mlnx1-OFED.4.1.0.1.7.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    12318 Jun 27 17:51 librxe-devel-static-41mlnx1-OFED.4.1.0.1.7.41102.i686.rpm

-r--r--r--. 1 abdel abdel    11982 Jun 27 17:51 librxe-devel-static-41mlnx1-OFED.4.1.0.1.7.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel   706362 Jun 27 18:29 libvma-8.3.7-1.x86_64.rpm

-r--r--r--. 1 abdel abdel    15254 Jun 27 18:29 libvma-devel-8.3.7-1.x86_64.rpm

-r--r--r--. 1 abdel abdel    37686 Jun 27 18:29 libvma-utils-8.3.7-1.x86_64.rpm

-r--r--r--. 1 abdel abdel 65966777 Jun 21 09:18 mft-4.7.0-41.x86_64.rpm

-r--r--r--. 1 abdel abdel   114746 Jun 27 18:34 mlnx-ethtool-4.2-1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel 10578239 Jun 27 21:34 mlnx-fw-updater-4.1-1.0.2.0.x86_64.rpm

-r--r--r--. 1 abdel abdel   100742 Jun 27 17:49 mlnx-ofa_kernel-4.1-OFED.4.1.1.0.2.1.gc22af88.fc24.x86_64.rpm

-r--r--r--. 1 abdel abdel  4801566 Jun 27 17:49 mlnx-ofa_kernel-devel-4.1-OFED.4.1.1.0.2.1.gc22af88.fc24.x86_64.rpm

-r--r--r--. 1 abdel abdel   948986 Jun 27 17:49 mlnx-ofa_kernel-modules-4.1-OFED.4.1.1.0.2.1.gc22af88.kver.4.5.5_300.fc24.x86_64.x86_64.rpm

-r--r--r--. 1 abdel abdel     6693 Jun 27 21:33 mlnx-ofed-all-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm

-r--r--r--. 1 abdel abdel     4835 Jun 27 21:33 mlnx-ofed-basic-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm

-r--r--r--. 1 abdel abdel    55356 Jun 27 21:33 mlnxofed-docs-4.1-1.0.2.0.noarch.rpm

-r--r--r--. 1 abdel abdel     4869 Jun 27 21:33 mlnx-ofed-dpdk-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm

-r--r--r--. 1 abdel abdel     6135 Jun 27 21:34 mlnx-ofed-guest-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm

-r--r--r--. 1 abdel abdel     6226 Jun 27 21:34 mlnx-ofed-hpc-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm

-r--r--r--. 1 abdel abdel     6263 Jun 27 21:34 mlnx-ofed-hypervisor-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm

-r--r--r--. 1 abdel abdel     4537 Jun 27 21:34 mlnx-ofed-kernel-only-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm

-r--r--r--. 1 abdel abdel     6541 Jun 27 21:34 mlnx-ofed-vma-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm

-r--r--r--. 1 abdel abdel     6245 Jun 27 21:34 mlnx-ofed-vma-eth-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm

-r--r--r--. 1 abdel abdel     6577 Jun 27 21:34 mlnx-ofed-vma-vpi-4.5.5-300.fc24.x86_64-4.1-1.0.2.0.noarch.rpm

-r--r--r--. 1 abdel abdel    28750 Jun 27 18:10 mpi-selector-1.0.3-1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel   448546 Jun 27 18:35 mpitests_openmpi-3.2.19-acade41.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel   991474 Jun 27 17:40 mstflint-4.7.0-1.6.g26037b7.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel  3782129 Jun 27 18:07 mxm-3.6.3102-1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    45270 Jun 27 17:38 ofed-scripts-4.1-OFED.4.1.1.0.2.x86_64.rpm

-r--r--r--. 1 abdel abdel 13966862 Jun 27 18:25 openmpi-2.1.2a1-1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel   818378 Jun 27 17:55 opensm-4.9.0.MLNX20170607.280b8f7-0.1.41102.i686.rpm

-r--r--r--. 1 abdel abdel   824930 Jun 27 17:54 opensm-4.9.0.MLNX20170607.280b8f7-0.1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel   228774 Jun 27 17:55 opensm-devel-4.9.0.MLNX20170607.280b8f7-0.1.41102.i686.rpm

-r--r--r--. 1 abdel abdel   228778 Jun 27 17:54 opensm-devel-4.9.0.MLNX20170607.280b8f7-0.1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    76098 Jun 27 17:55 opensm-libs-4.9.0.MLNX20170607.280b8f7-0.1.41102.i686.rpm

-r--r--r--. 1 abdel abdel    71238 Jun 27 17:54 opensm-libs-4.9.0.MLNX20170607.280b8f7-0.1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    68354 Jun 27 17:55 opensm-static-4.9.0.MLNX20170607.280b8f7-0.1.41102.i686.rpm

-r--r--r--. 1 abdel abdel    67522 Jun 27 17:54 opensm-static-4.9.0.MLNX20170607.280b8f7-0.1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel   254870 Jun 27 17:57 perftest-4.1-0.4.g16dbf63.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    63946 Jun 27 18:05 qperf-0.4.9-9.41102.x86_64.rpm

dr-xr-xr-x. 2 abdel abdel     2048 Jun 27 21:34 repodata

-r--r--r--. 1 abdel abdel  1674339 Jun 27 18:10 sharp-1.3.1.MLNX20170625.859dc24-1.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel   857178 Jun 27 18:33 sockperf-3.1-14.gita9f6056282ef.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel    40270 Jun 27 17:50 srp-4.0-OFED.4.1.1.0.2.1.gc22af88.kver.4.5.5_300.fc24.x86_64.x86_64.rpm

-r--r--r--. 1 abdel abdel    51066 Jun 27 17:57 srptools-41mlnx1-4.41102.x86_64.rpm

-r--r--r--. 1 abdel abdel  2156449 Jun 27 18:08 ucx-1.2.2947-1.41102.x86_64.rpm

 

thanks

Re: ConnectX-3 Pro connecting at 10g instead of 40g

$
0
0

The cable was the issue. I got a cable from the list provided, did the loop test and the speed went up to 40g. It also increased the speed to 40g between the Win and Vmware servers.

This is the cable I used: link

Thank you.

How to test RDMA traffic congestion

$
0
0

Hi. We're trying to debug issues we see periodically with Lustre Networking on top of CX-3 and CX-4 based RoCE(v1) fabrics using SR-IOV for connections from Lustre clients running as KVM guests (servers are bare-metal). When we hit these errors we see drop/error counters going up on the hosts.

 

So far all simple ib tests between host-pairs look ok, now we want to test congestion scenarios, e.g., 2 hosts sending to 1 host. However we've discovered that whilst e.g. ib_write_bw has an option to specify more than one QP, it actually doesn't support it! Is there a simple way to engineer such a test or are we going to have to write something or move to an MPI based test suite...?


Question about ESXi 6.5 iSER driver with PFC port configuraion.

$
0
0

Hi!

I saw a Mellanox iSER driver for ESXi 6.5 released.

But this driver seem to be need Global Pause configuration on each ports that show on driver's manual.

 

Is there any solution with PFC configuration on switch's ports?

 

Best Regard,

 

P.S

Why iser storage adapter disappeared after every ESXi 6.5 host reboot?

 

P.S 02

This Global Pause based iSER initiator can't connect SCST Ethernet iSER Target.

How can I resolve this problem?

Tested iSER Target

01. LIO

02. SCST

03. StarWind vSAN

LAG problems

$
0
0

My team is currently standing up a new cluster that has an SN2700 core ethernet switch on our boot network.  LAG links are working fine between this core and the leaf switches in the new cluster.  We also have an older cluster with an SX1036 ethernet switch serving as its core switch.  LAG links are also working fine between this older core switch and the older leaf switches in that cluster.  Several of us have tried to get LAG working between the SX1036 and SN2700 and we can't working link (single link works fine).  We've done typical troubleshooting looking for bad cables/ports etc.  We can find no differences comparing the configurations and status for working LAG links and the failing link.

 

The SX1036 is a PPC switch and is running a much older firmware:

 

Product name:      MLNX-OS

Product release:   3.4.3002

Build ID:          #1-dev

Build date:        2015-07-30 20:13:15

Target arch:       ppc

Target hw:         m460ex

Built by:          jenkins@fit74

Version summary:   PPC_M460EX 3.4.3002 2015-07-30 20:13:15 ppc

 

Product model:     ppc

 

than the SN2700 (X86):

 

Product name:      MLNX-OS

Product release:   3.6.3200

Build ID:          #1-dev

Build date:        2017-03-09 17:55:58

Target arch:       x86_64

Target hw:         x86_64

Built by:          jenkins@e3f42965d5ee

Version summary:   X86_64 3.6.3200 2017-03-09 17:55:58 x86_64

 

Product model:     x86onie

 

The obvious thing to try is updating the firmware on the SX1036, but this cluster is in production and our team is nervous about messing with that core switch as it's pretty critical to our infrastructure.  Would a firmware mismatch cause this behavior.

 

I have seen documentation indicating that MLAG doesn't work between PPC and X86 switches.  I sure hope that's not the case for LAG...

iSER driver for ESXi 6.5 - Does it support ConnectX-3 or not

$
0
0

I saw your iSER driver 1.0.0.1 release notes.

 

 

nmlx4-core 3.16.0.0 is ESXi 6.5 inbox driver.

After install this iSER driver 1.0.0.1 for ESXi 6.5, is there any more configuration needed?

Issues with setting up Storage Spaces Direct

$
0
0

Hello everyone,

I am working on setting up a S2D Cluster and have ran into an issue where I am unable to get my nodes to communicate via RDMA I have used the Test-RDMA.ps1 script and DISKSPD provided in another post in the Mellanox Community.

Here is my hardware configuration:

Configuration 4 Nodes with Following configurations on each

Hardware:       Intel R2224WTTYSR Server Systems

                        256GB Samsung DDR4 LRDIMMs

                        2x Intel E5-2620 v4 Xeon CPU

                        1x Mellanox ConnectX4 - MCX414A-BCAT

                        1x Broadcom LSI 3805-24i HBA

                        2x Intel DC P3700 800GB for Journal\cache drives

                        4x Seagate 2TB SAS HDs for Capacity drives

Networking     1x Netgear 10GbE network switch for VMs

                        2x Mellanox SX1012 12 Port QSFP28 Switch for RDMA\cluster Traffic

                        8x MC2210128-003 Mellanox LinkX Cables

We are not utilizing SET Teams and are only using the ConnectX4 NICs for RoCE traffic for the storage traffic.

 

All nodes are setup with this configuration for the RDMA enabled NICs:

Attached is the configuration of my Mellanox 1012X Switch.

 

Any help in the right direction is very appreciated.

 

Thanks!

ceph + rdma error: ibv_open_device failed

$
0
0

I followed this doc:

Bring Up Ceph RDMA - Developer's Guide

But mon could not start with this error:

7f5acb890700 -1 Infiniband Device open rdma device failed. (2) No such file or directory

 

I checked ceph code:

116  name = ibv_get_device_name(device);

117  ctxt = ibv_open_device(device);

118  if (ctxt == NULL) {

119    lderr(cct) << __func__ << " open rdm a device failed. "<< cpp_strerror(errno) << dendl;

120    ceph_abort();

121  }

 

Then

gdb info:

Breakpoint 1, Device::Device (this=0x55555f3597e0, cct=0x55555eec01c0, d=<optimized out>)

    at /usr/src/debug/ceph-12.2.0/src/msg/async/rdma/Infiniband.cc:116

116  name = ibv_get_device_name(device);

$7 = {ops = {alloc_context = 0x0, free_context = 0x0}, node_type = IBV_NODE_CA, transport_type = IBV_TRANSPORT_IB,                                 -------------------------------------------*(struct ibv_device *) device

  name = "mlx4_0", '\000' <repeats 57 times>, dev_name = "uverbs0", '\000' <repeats 56 times>,

  dev_path = "/sys/class/infiniband_verbs/uverbs0", '\000' <repeats 220 times>,

  ibdev_path = "/sys/class/infiniband/mlx4_0", '\000' <repeats 227 times>}

 

Breakpoint 2, Device::Device (this=0x55555f3597e0, cct=0x55555eec01c0, d=<optimized out>)

    at /usr/src/debug/ceph-12.2.0/src/msg/async/rdma/Infiniband.cc:117

117  ctxt = ibv_open_device(device);

Cannot access memory at address 0x646f6e2f305f3478                                                 -------------------------------------------*(struct CephContext *) ctxt

 

It seems that ibv_open_device failed

 

# ibstat

CA 'mlx4_0'

CA type: MT26428

Number of ports: 1

Firmware version: 2.9.1000

Hardware version: b0

Node GUID: 0x0002c90300589efc

System image GUID: 0x0002c90300589eff

Port 1:

State: Active

Physical state: LinkUp

Rate: 40

Base lid: 33

LMC: 0

SM lid: 23

Capability mask: 0x0251086a

Port GUID: 0x0002c90300589efd

Link layer: InfiniBand

 

Is there any problem with the data of struct ibv_device?

Re: iSER driver for ESXi 6.5 - Does it support ConnectX-3 or not

$
0
0

Here are some links that will guide you through the proper installation & implimentation of iSER v1.0.0.1 along with EXSi 6.5 ConnectX-3/Pro Inbox Driver v3.16.0.0

  1. Inbox Drive v3.16.0.0 supports only CX-3/pro adapter
  2. the iSER driver supports all CX-3, CX-4 & CX-5 adapters
  3. iSER driver:

- Release-Note:  http://www.mellanox.com/related-docs/prod_software/Mellanox_MLNX-NATIVE-ESX-iSER_Driver_for_VMware_ESXi_6.5_Release_Notes_v1.0.0.1.pdf

- Instructions on how to install are in the Quick start guide:  http://www.mellanox.com/related-docs/prod_software/Mellanox_MLNX-NATIVE-ESX-iSER_Driver_for_VMware_ESXi_6.5_Quick_Start_Guide_v1.0.0.1.pdf

  1. as for Inbox driver: It is part of the esxi6.5 OS and can be revealed by running: # esxcli software vib list

 

Hope this helps

Re: iSER driver for ESXi 6.5 - Does it support ConnectX-3 or not

$
0
0

Your guide show me a old ESXi 6.0 interface figure and old Global Pause switch configuration.

 

Do you have real world guide for RoCE v1 that on iSER Quick Guide on your site?

 

Best Regard,

Jae-Hoon Choi


How do I disable FEC for MCX416A-CCAT on windows

$
0
0

I have a MCX416A-CCAT I already set the seed to 100Gb and disable auto- negotiation I was wondering if is there a way to disable FEC.

Re: ConnectX-3 WinOF 5.35 on Win2016 Multiple Partitions

$
0
0

I am not sure if this applies, but Linux has a max IPoB MTU of 2048(2044) while Windows has max 4096(4092)

 

When connecting the 2 different OS you must use the smaller MTU 2044. This normally must be configured manually on the Windows Servers.  Failure to set a common MTU in the past yielded poor performance or connectivity issues.

 

It has been a while since we used both together so things may have changed.

Re: Issues with setting up Storage Spaces Direct

$
0
0

We are using S2D with IB on ConnectX 3 cards.  No problems.

 

I am not familiar with the Test-RDMA script.

 

How are you testing RDMA?  I use Windows Performance Monitor.  There are RDMA specific counters which make it easy to track RDMA traffic.  Other than setup and drivers there was not much else to do.

 

Have you installed the latest drivers for you cards?

IB Switch IS5035 MTU Setting?

$
0
0

I am wondering if someone can help with this.

 

We have 2x Mellanox IS5035 Switches running SM.

 

I have set the MTU on all of our Mellanox ConnectX3 Adapters on our Windows 2012 and 2016 Servers to 4092.

We get the following error in the Windows Server 2016 Event log

"According to the configuration under the "Jumbo Packets" advanced property, the MTU configured for device Mellanox ConnectX-3 IPoIB Adapter is 4092. The effective MTU is the supplied value + 4 bytes (for the IPoIB header). This configuration exceeds the MTU reported by OpenSM, which is 2048. This inconsistency may result in communication failures. Please change the MTU of IPoIB or OpenSM, and restart the driver."

 

In the IS5035 under Fabric MGMT > Partitions the Default MTU is set to Default.

 

Is this the correct place to change the SM MTU?

Will it take effect right away?

Will there be any interruption in traffic? These are production servers so I need to know the impact

 

Thanks!

 

Todd

Re: RDS-TOOLS PACKAGE ON MLNX_OFED_LINUX-4.1-1.0.2.0-fc24-x86_64.iso ?

$
0
0

Hi Minouche S.A.

 

Thank you for contacting the Mellanox Community and posting your question.

 

Unfortunately, as mentioned in the Mellanox OFED Driver Release Notes ( http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_Release_Notes_4_1-1_0_2_0.pdf  ), Section 2.2 "Unsupported Functionalities/Features/HCAs", Mellanox does not support RDS. Therefor no RDS module and / or RDS related tools are supplied with the latest Mellanox OFED driver. Some older driver versions may contain a version of the RDS module but still Mellanox does not support the use of it.

 

For support and use of RDS, the OS vendors INBOX driver and tools should be used.

 

Thanks.

 

Cheers,

~Martijn

Viewing all 6227 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>