Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6227 articles
Browse latest View live

Re: Line rate using Connect_X5 100G EN in Ubuntu; PCIe speed difference;

$
0
0

Hi Arvind,

 

I corrected my earlier answer with the correct links and added an extra link regarding to our NIC's Performance with DPDK 18.02.

 

Many thanks.

~Mellanox Technical Support


Re: ConnectX-3 VFs communication

RoCE - Cisco catalyst 4506 switch

$
0
0

Hello Everyone,

 

Is it possible to prepare RoCE setup using Cisco catalyst 4506 switch and servers with Mellanox ConnectX-5 NIC.

 

Thanks,

Suraj Gour

Can anybody provide steps on how to run RoCE over VXLAN ?

$
0
0

Can anybody provide steps on how to run RoCE traffic over VXLAN in any linux OS ?

Re: RDMA_CM_EVENT_ROUTE_ERROR

$
0
0

Without reproduction it is impossible to resolve it. However, RDMA_CM_EVENT_ROUTE_ERROR usually mean that there is no route to specific host. You can verify it by using simple 'ping' command. Try to analyze routing table on your host (maybe it has duplicate entries), if you are using dual port card, check if disconnect one port makes the issue go away.

rdma_resolve _route depends on OS kernel routing and if it doesn't work, RDMA route resolution will fail.

Verify that you are using the latest version of Mellanox OFED stack.

Be sure to use latest firmware version

Be sure you are using subnet manager that comes with Mellanox OFED stack

Check the output of ibv_devinfo command and be sure that 'guid' of the node are not '0' (zero's).

Check 'dmesg'/syslog files,maybe you'll see an additional info that will help.

As additional diagnostic, check with 'ib_read_lat' application for example and using' '-R' flag. if it works and your application doesn't , check the application code.

Re: RoCEv2 PFC/ECN Issues

ConnectX-5 error: Failed to write to /dev/nvme-fabrics: Invalid cross-device link

$
0
0

I have 2 ConnectX-5 NICs in my PC (Ubuntu 18.04, kernel 4.15.0-36). They are in 2 different subnets (192.168.1.100/24, 192.168.2.100/24). I have 4 NVMoF targets and I try to connect them from my PC:

 

sudo nvme connect -t rdma -a 192.168.2.52 -n nqn.2018-09.com.52 -s 4420

sudo nvme connect -t rdma -a 192.168.1.9 -n nqn.2018-09.com.9 -s 4420

sudo nvme connect -t rdma -a 192.168.2.54 -n nqn.2018-09.com.54 -s 4420

sudo nvme connect -t rdma -a 192.168.1.2 -n nqn.2018-09.com.2 -s 4420

Failed to write to /dev/nvme-fabrics: Invalid cross-device link

 

I disconnect all these targets and reboot the PC. Then I try to connect to these targets in a different order:

 

sudo nvme connect -t rdma -a 192.168.1.2 -n nqn.2018-09.com.2 -s 4420

sudo nvme connect -t rdma -a 192.168.1.9 -n nqn.2018-09.com.9 -s 4420

sudo nvme connect -t rdma -a 192.168.2.52 -n nqn.2018-09.com.52 -s 4420

Failed to write to /dev/nvme-fabrics: Invalid cross-device link

 

I google a bit. It seems that there are 2 report instances of this error message related to Mellanox NIC. But I don't understand the nature of this error and I don't see any work-around. Any suggestions? Here's some info from my PC.

 

 

yao@Host1:~$ lspci | grep Mellan

15:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]

21:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]

 

yao@Host1:~$ lspci -vvv -s 15:00.0

15:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]

Subsystem: Mellanox Technologies MT27800 Family [ConnectX-5]

Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+

Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

Latency: 0, Cache Line Size: 32 bytes

Interrupt: pin A routed to IRQ 33

NUMA node: 0

Region 0: Memory at 387ffe000000 (64-bit, prefetchable) [size=32M]

Expansion ROM at 90500000 [disabled] [size=1M]

Capabilities: <access denied>

Kernel driver in use: mlx5_core

Kernel modules: mlx5_core

 

yao@Host1:~$ sudo lsmod | grep mlx

mlx5_ib               196608  0

ib_core               225280  9 ib_cm,rdma_cm,ib_umad,nvme_rdma,ib_uverbs,iw_cm,mlx5_ib,ib_ucm,rdma_ucm

mlx5_core             544768  1 mlx5_ib

mlxfw                  20480  1 mlx5_core

devlink                45056  1 mlx5_core

ptp                    20480  2 e1000e,mlx5_core

 

yao@Host1:~$ modinfo mlx5_core

filename:       /lib/modules/4.15.0-36-generic/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko

version:        5.0-0

license:        Dual BSD/GPL

description:    Mellanox Connect-IB, ConnectX-4 core driver

author:         Eli Cohen <eli@mellanox.com>

srcversion:     C271CE9036D77E924A8E038

alias:          pci:v000015B3d0000A2D3sv*sd*bc*sc*i*

alias:          pci:v000015B3d0000A2D2sv*sd*bc*sc*i*

alias:          pci:v000015B3d0000101Csv*sd*bc*sc*i*

alias:          pci:v000015B3d0000101Bsv*sd*bc*sc*i*

alias:          pci:v000015B3d0000101Asv*sd*bc*sc*i*

alias:          pci:v000015B3d00001019sv*sd*bc*sc*i*

alias:          pci:v000015B3d00001018sv*sd*bc*sc*i*

alias:          pci:v000015B3d00001017sv*sd*bc*sc*i*

alias:          pci:v000015B3d00001016sv*sd*bc*sc*i*

alias:          pci:v000015B3d00001015sv*sd*bc*sc*i*

alias:          pci:v000015B3d00001014sv*sd*bc*sc*i*

alias:          pci:v000015B3d00001013sv*sd*bc*sc*i*

alias:          pci:v000015B3d00001012sv*sd*bc*sc*i*

alias:          pci:v000015B3d00001011sv*sd*bc*sc*i*

depends:        devlink,ptp,mlxfw

retpoline:      Y

intree:         Y

name:           mlx5_core

vermagic:       4.15.0-36-generic SMP mod_unload

signat:         PKCS#7

signer:        

sig_key:       

sig_hashalgo:   md4

parm:           debug_mask:debug mask: 1 = dump cmd data, 2 = dump cmd exec time, 3 = both. Default=0 (uint)

parm:           prof_sel:profile selector. Valid range 0 - 2 (uint)

 

 

yao@Host1:~$ dmesg

...

[   78.772669] nvme nvme0: queue_size 128 > ctrl maxcmd 64, clamping down

[   78.856378] nvme nvme0: creating 8 I/O queues.

[   88.297468] nvme nvme0: new ctrl: NQN "nqn.2018-09.com.52", addr 192.168.2.52:4420

[  101.561197] nvme nvme1: queue_size 128 > ctrl maxcmd 64, clamping down

[  101.644852] nvme nvme1: creating 8 I/O queues.

[  111.083806] nvme nvme1: new ctrl: NQN "nqn.2018-09.com.9", addr 192.168.1.9:4420

[  151.368016] nvme nvme2: queue_size 128 > ctrl maxcmd 64, clamping down

[  151.451717] nvme nvme2: creating 8 I/O queues.

[  160.893710] nvme nvme2: new ctrl: NQN "nqn.2018-09.com.54", addr 192.168.2.54:4420

[  169.789368] nvme nvme3: queue_size 128 > ctrl maxcmd 64, clamping down

[  169.873068] nvme nvme3: creating 8 I/O queues.

[  177.657661] nvme nvme3: Connect command failed, error wo/DNR bit: -16402

[  177.657669] nvme nvme3: failed to connect queue: 4 ret=-18

[  177.951379] nvme nvme3: Reconnecting in 10 seconds...

[  188.138167] general protection fault: 0000 [#1] SMP PTI

[  188.138172] Modules linked in: nvme_rdma rdma_ucm rdma_cm nvme_fabrics nvme_core ib_ucm ib_uverbs ib_umad iw_cm ib_cm nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec aes_x86_64 crypto_simd glue_helper cryptd snd_hda_core snd_hwdep intel_cstate snd_pcm cp210x snd_seq_midi snd_seq_midi_event joydev input_leds snd_rawmidi usbserial snd_seq snd_seq_device snd_timer snd mei_me soundcore wmi_bmof hp_wmi sparse_keymap ioatdma mac_hid intel_rapl_perf mei dca intel_wmi_thunderbolt shpchp serio_raw sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 mlx5_ib ib_core amdgpu chash hid_generic usbhid hid

[  188.138248]  radeon i2c_algo_bit ttm mlx5_core drm_kms_helper syscopyarea e1000e sysfillrect mlxfw sysimgblt devlink ahci fb_sys_fops ptp psmouse drm pps_core libahci wmi

[  188.138272] CPU: 0 PID: 390 Comm: kworker/u56:7 Not tainted 4.15.0-36-generic #39-Ubuntu

[  188.138275] Hardware name: HP HP Z4 G4 Workstation/81C5, BIOS P62 v01.51 05/08/2018

[  188.138283] Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]

[  188.138290] RIP: 0010:nvme_rdma_alloc_queue+0x3c/0x190 [nvme_rdma]

[  188.138294] RSP: 0018:ffffc04c041e3e08 EFLAGS: 00010286

[  188.138298] RAX: 0000000000000000 RBX: 890a8eecb83679a9 RCX: ffff9f9b5ec10820

[  188.138301] RDX: ffffffffc0cd5600 RSI: ffffffffc0cd43ab RDI: ffff9f9ad037c000

[  188.138304] RBP: ffffc04c041e3e28 R08: 000000000000020c R09: 0000000000000000

[  188.138307] R10: 0000000000000000 R11: 000000000000020f R12: ffff9f9ad037c000

[  188.138309] R13: 0000000000000000 R14: 0000000000000020 R15: 0000000000000000

[  188.138313] FS:  0000000000000000(0000) GS:ffff9f9b5f200000(0000) knlGS:0000000000000000

[  188.138316] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[  188.138319] CR2: 00007f347e159fb8 CR3: 00000001a740a006 CR4: 00000000003606f0

[  188.138323] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

[  188.138325] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

[  188.138327] Call Trace:

[  188.138335]  nvme_rdma_configure_admin_queue+0x22/0x2d0 [nvme_rdma]

[  188.138341]  nvme_rdma_reconnect_ctrl_work+0x27/0xd0 [nvme_rdma]

[  188.138349]  process_one_work+0x1de/0x410

[  188.138354]  worker_thread+0x32/0x410

[  188.138361]  kthread+0x121/0x140

[  188.138365]  ? process_one_work+0x410/0x410

[  188.138370]  ? kthread_create_worker_on_cpu+0x70/0x70

[  188.138378]  ret_from_fork+0x35/0x40

[  188.138381] Code: 89 e5 41 56 41 55 41 54 53 48 8d 1c c5 00 00 00 00 49 89 fc 49 89 c5 49 89 d6 48 29 c3 48 c7 c2 00 56 cd c0 48 c1 e3 04 48 03 1f <48> 89 7b 18 48 8d 7b 58 c7 43 50 00 00 00 00 e8 50 05 40 ce 45

[  188.138443] RIP: nvme_rdma_alloc_queue+0x3c/0x190 [nvme_rdma] RSP: ffffc04c041e3e08

[  188.138447] ---[ end trace c9efe5e9bc3591f2 ]---

 

yao@Host1:~$ dmesg | grep mlx

[    2.510581] mlx5_core 0000:15:00.0: enabling device (0100 -> 0102)

[    2.510732] mlx5_core 0000:15:00.0: firmware version: 16.21.2010

[    4.055064] mlx5_core 0000:15:00.0: Port module event: module 0, Cable plugged

[    4.061558] mlx5_core 0000:21:00.0: enabling device (0100 -> 0102)

[    4.061775] mlx5_core 0000:21:00.0: firmware version: 16.21.2010

[    4.966172] mlx5_core 0000:21:00.0: Port module event: module 0, Cable plugged

[    4.972503] mlx5_core 0000:15:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(64) RxCqeCmprss(0)

[    5.110943] mlx5_core 0000:21:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(64) RxCqeCmprss(0)

[    5.247925] mlx5_core 0000:15:00.0 enp21s0: renamed from eth0

[    5.248600] mlx5_ib: Mellanox Connect-IB Infiniband driver v5.0-0

[    5.275912] mlx5_core 0000:21:00.0 enp33s0: renamed from eth1

[   23.736990] mlx5_core 0000:21:00.0 enp33s0: Link up

[   23.953415] mlx5_core 0000:15:00.0 enp21s0: Link up

[  188.138172] Modules linked in: nvme_rdma rdma_ucm rdma_cm nvme_fabrics nvme_core ib_ucm ib_uverbs ib_umad iw_cm ib_cm nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec aes_x86_64 crypto_simd glue_helper cryptd snd_hda_core snd_hwdep intel_cstate snd_pcm cp210x snd_seq_midi snd_seq_midi_event joydev input_leds snd_rawmidi usbserial snd_seq snd_seq_device snd_timer snd mei_me soundcore wmi_bmof hp_wmi sparse_keymap ioatdma mac_hid intel_rapl_perf mei dca intel_wmi_thunderbolt shpchp serio_raw sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 mlx5_ib ib_core amdgpu chash hid_generic usbhid hid

[  188.138248]  radeon i2c_algo_bit ttm mlx5_core drm_kms_helper syscopyarea e1000e sysfillrect mlxfw sysimgblt devlink ahci fb_sys_fops ptp psmouse drm pps_core libahci wmi

[  662.506623] Modules linked in: cfg80211 nvme_rdma rdma_ucm rdma_cm nvme_fabrics nvme_core ib_ucm ib_uverbs ib_umad iw_cm ib_cm nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec aes_x86_64 crypto_simd glue_helper cryptd snd_hda_core snd_hwdep intel_cstate snd_pcm cp210x snd_seq_midi snd_seq_midi_event joydev input_leds snd_rawmidi usbserial snd_seq snd_seq_device snd_timer snd mei_me soundcore wmi_bmof hp_wmi sparse_keymap ioatdma mac_hid intel_rapl_perf mei dca intel_wmi_thunderbolt shpchp serio_raw sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 mlx5_ib ib_core amdgpu chash hid_generic

Re: Support for "INBOX drivers?" for 18.04/connected mode?

$
0
0

Hi Bill,

 

Here is the link to the Release Notes for Ubuntu 18.04 Inbox Driver. Section 2 Changes and New Features lists the support for Enhanced IPoIB for ConnectX-4 cards. Also, I am attaching the link for the User Manual for your reference. Which card are you using? Also, you mentioned you are receiving the following error [   57.573664] ib_ipoib: unknown parameter 'ipoib_enhanced' ignored. Is this after testing with MLNX_OFED? If yes, which OFED version did you test it with?

 

http://www.mellanox.com/pdf/prod_software/Ubuntu_18.04_Inbox_Driver_Release_Notes.pdf

 

http://www.mellanox.com/pdf/prod_software/Ubuntu_18_04_Inbox_Driver_User_Manual.pdf


Re: Line rate using Connect_X5 100G EN in Ubuntu; PCIe speed difference;

$
0
0

I am not familiar with pktgen and thus I may miss the point. But what really catches my eyes is your 52G throughput on the TX side. That's exactly the same rate I saw when I ran FIO test with ConnectX-5 and I am puzzled why it doesn't come close to the line rate (see Bad RoCEv2 throughput with ConnectX-5 ). I am using the stock Linux driver instead of Mellanox OFED.

Re: Support for "INBOX drivers?" for 18.04/connected mode?

$
0
0

Hi Bill,

 

 

 

Here is the link to the Release Notes for Ubuntu 18.04 Inbox Driver. Section 2

Changes and New Features lists the support for Enhanced IPoIB for ConnectX-4

cards. Also, I am attaching the link for the User Manual for your reference.

 

The documentation was quite helpful, thanks.

 

Which card are you using?

 

root@k1:~# lspci | grep -i mellanox

06:00.0 Infiniband controller: Mellanox Technologies MT27700 Family

 

 

Also, you mentioned you are receiving the following

error  ib_ipoib: unknown parameter 'ipoib_enhanced' ignored. Is

this after testing with MLNX_OFED? If yes, which OFED version did you test it with?

 

I was tried what comes with ubuntu 18.04 and with

"mlnx-en-4.4-2.0.7.0-ubuntu18.04-x86_64".  I'm open to a solution with either.

 

Do you have any recommendations for getting ipoib connected mode to work with

either the native ubuntu 18.04 drivers or the MLNX_OFED drivers?

 

I tried this because it was recommended in the Mellanox forum:

  root@k2:/etc/modprobe.d# cat ib_ipoib.conf

  options ib_ipoib ipoib_enhanced=0

 

But when I boot the above mentioned error shows up.

 

When I install the MLNX_OFED driver, I don't see any errors.  Here's the summary:

Device #1:

Re: Ubuntu Connected mode not working

$
0
0

From what I can tell this no longer works with Ubuntu 18.04.  When I try this solution, during boot I get:

[ 4128.198929] ib_ipoib: unknown parameter 'ipoib_enhanced' ignored

 

I tried with the native 18.04 drivers and after installing mlnx-en-4.4-2.0.7.0-ubuntu18.04-x86_64.tgz

Re: Support for "INBOX drivers?" for 18.04/connected mode?

$
0
0

I had the wrong package installed, MLNX_OFED fixed the problem and connected mode now works.

 

Is there any way to get connected mode working with the drivers that come with Ubuntu 18.04?

ConnectX®-4 EN Adapter Card support for NetApp EF570

$
0
0

Hi,

 

Im designing next generation back end storage network for one of our customer, where i'm recommending them to go with NVMe | RoCE protocol | 100Gig eth BW.

please can anyone help me in validating below - if connectX-4 EN is compatible with NetApp EF570?

 

Networking Gear:

Spine switch       –  Arista 7060CX2-32S

Leaf Switch         – Arista 7280CR-48

 

Storage:

NetApp EF570 (NIC – Mallanox ConnectX®-4 EN Adapter Card Dual-Port 100 Gigabit Ethernet Adapter)

would really appreciate response at the earliest.

 

Nabeel

 

Re: mlx5: ethtool -m not working

$
0
0

Thanks for the firmware note. I was incorrect about the firmware on the Connectx-4 being the latest. Updated to latest firmware and ethtool -m works as expected.

Re: Not able to offload tc flows to hardware with Linux tc


Symbol Errors

$
0
0

How to calculate Symbol Errors. I need the formula to calculate Symbol errors.

 

Any help?

ConnectX-4 Lx: when will the driver be released for Windows Server 2019?

$
0
0

We are planning to setup iSER interop testing with ConnectX-4 Lx on latest Windows Server 2019.

Does anyone know when the driver/fw will be released for Win2019? Thanks!

Setting up FDR infiniband

$
0
0

Hi ,

Please, i am a real beginner in Infiniband interconnect !, and need help please.

I want to set up infiniband FDR over HPC cluster, I haven't yet order the parts, but i am thinking to the followings :

  • Mellanox FDR single port on servers DELL C6220, CX383A ( 56 Gbps) [I think actually this is FDR10 at 40Gbps !].
  • Mellanox FDR Switch MSX6036F ( 56 Gbps).

First of all, my interest is Latency more than bandwidth, First question:

  • should i buy  QSFP FDR  cables or QSFP QDR would work correctly ?.
  • On cards & Switch descriptions ,  ports are mentioned as QSFP, but i see that there are QSFP+ cables, would they work too, or simply they are the same  ,

 

In other hand, regarding software & OS, the cluster is running very well with 10GbE SFP+ on Linux CentOS, then setting up infiniband interconnect would be more complected ( things to do and things to avoid) ?

 

I certainly have other questions but, at this time........

Thanks for help

BR

Re: Can anybody provide steps on how to run RoCE over VXLAN ?

$
0
0

Hi,

Before going with RoCE. Are you able to run TCP/IP traffic (including UDP, ICMP) over VXLAN? RoCE v2 based on UDP.

Re: ConnectX®-4 EN Adapter Card support for NetApp EF570

$
0
0

Hi Mohammed,

Mellanox HCA are compatible with SPECS, protocols, etc. When asking about compatibility against external appliance it might be better to ask regarding specific features if it supported by adapter. Are you looking for RoCE support - it is supported, NVMe - it is supported. If your question is if NetApp EF570 supports specific piece of the hardware like Mellanox HCA, the question should be addressed to NetApp as they use their own version of the driver and the firmware and might have some hardware compatibility matrix.

Viewing all 6227 articles
Browse latest View live