Hi Arvind,
I corrected my earlier answer with the correct links and added an extra link regarding to our NIC's Performance with DPDK 18.02.
Many thanks.
~Mellanox Technical Support
Hi Arvind,
I corrected my earlier answer with the correct links and added an extra link regarding to our NIC's Performance with DPDK 18.02.
Many thanks.
~Mellanox Technical Support
Hi Jorge,
Many thanks for posting your question on the Mellanox Community.
Can you please check the following link on how-to set VF's Network attributes -> HowTo Set Virtual Network Attributes on a Virtual Function (SR-IOV)
If this does not resolve the issue, please post your SR-IOV config and network configurations.
Many thanks.
~Mellanox Technical Support
Hello Everyone,
Is it possible to prepare RoCE setup using Cisco catalyst 4506 switch and servers with Mellanox ConnectX-5 NIC.
Thanks,
Suraj Gour
Can anybody provide steps on how to run RoCE traffic over VXLAN in any linux OS ?
Without reproduction it is impossible to resolve it. However, RDMA_CM_EVENT_ROUTE_ERROR usually mean that there is no route to specific host. You can verify it by using simple 'ping' command. Try to analyze routing table on your host (maybe it has duplicate entries), if you are using dual port card, check if disconnect one port makes the issue go away.
rdma_resolve _route depends on OS kernel routing and if it doesn't work, RDMA route resolution will fail.
Verify that you are using the latest version of Mellanox OFED stack.
Be sure to use latest firmware version
Be sure you are using subnet manager that comes with Mellanox OFED stack
Check the output of ibv_devinfo command and be sure that 'guid' of the node are not '0' (zero's).
Check 'dmesg'/syslog files,maybe you'll see an additional info that will help.
As additional diagnostic, check with 'ib_read_lat' application for example and using' '-R' flag. if it works and your application doesn't , check the application code.
Hi Robert,
Please, follow this link - Recommended Network Configuration Examples for RoCE Deployment - to configure your the host and the switch. When using non-Mellanox switch, check with switch vendor what are the corresponding commands.
I have 2 ConnectX-5 NICs in my PC (Ubuntu 18.04, kernel 4.15.0-36). They are in 2 different subnets (192.168.1.100/24, 192.168.2.100/24). I have 4 NVMoF targets and I try to connect them from my PC:
sudo nvme connect -t rdma -a 192.168.2.52 -n nqn.2018-09.com.52 -s 4420
sudo nvme connect -t rdma -a 192.168.1.9 -n nqn.2018-09.com.9 -s 4420
sudo nvme connect -t rdma -a 192.168.2.54 -n nqn.2018-09.com.54 -s 4420
sudo nvme connect -t rdma -a 192.168.1.2 -n nqn.2018-09.com.2 -s 4420
Failed to write to /dev/nvme-fabrics: Invalid cross-device link
I disconnect all these targets and reboot the PC. Then I try to connect to these targets in a different order:
sudo nvme connect -t rdma -a 192.168.1.2 -n nqn.2018-09.com.2 -s 4420
sudo nvme connect -t rdma -a 192.168.1.9 -n nqn.2018-09.com.9 -s 4420
sudo nvme connect -t rdma -a 192.168.2.52 -n nqn.2018-09.com.52 -s 4420
Failed to write to /dev/nvme-fabrics: Invalid cross-device link
I google a bit. It seems that there are 2 report instances of this error message related to Mellanox NIC. But I don't understand the nature of this error and I don't see any work-around. Any suggestions? Here's some info from my PC.
yao@Host1:~$ lspci | grep Mellan
15:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
21:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
yao@Host1:~$ lspci -vvv -s 15:00.0
15:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
Subsystem: Mellanox Technologies MT27800 Family [ConnectX-5]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 33
NUMA node: 0
Region 0: Memory at 387ffe000000 (64-bit, prefetchable) [size=32M]
Expansion ROM at 90500000 [disabled] [size=1M]
Capabilities: <access denied>
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core
yao@Host1:~$ sudo lsmod | grep mlx
mlx5_ib 196608 0
ib_core 225280 9 ib_cm,rdma_cm,ib_umad,nvme_rdma,ib_uverbs,iw_cm,mlx5_ib,ib_ucm,rdma_ucm
mlx5_core 544768 1 mlx5_ib
mlxfw 20480 1 mlx5_core
devlink 45056 1 mlx5_core
ptp 20480 2 e1000e,mlx5_core
yao@Host1:~$ modinfo mlx5_core
filename: /lib/modules/4.15.0-36-generic/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko
version: 5.0-0
license: Dual BSD/GPL
description: Mellanox Connect-IB, ConnectX-4 core driver
author: Eli Cohen <eli@mellanox.com>
srcversion: C271CE9036D77E924A8E038
alias: pci:v000015B3d0000A2D3sv*sd*bc*sc*i*
alias: pci:v000015B3d0000A2D2sv*sd*bc*sc*i*
alias: pci:v000015B3d0000101Csv*sd*bc*sc*i*
alias: pci:v000015B3d0000101Bsv*sd*bc*sc*i*
alias: pci:v000015B3d0000101Asv*sd*bc*sc*i*
alias: pci:v000015B3d00001019sv*sd*bc*sc*i*
alias: pci:v000015B3d00001018sv*sd*bc*sc*i*
alias: pci:v000015B3d00001017sv*sd*bc*sc*i*
alias: pci:v000015B3d00001016sv*sd*bc*sc*i*
alias: pci:v000015B3d00001015sv*sd*bc*sc*i*
alias: pci:v000015B3d00001014sv*sd*bc*sc*i*
alias: pci:v000015B3d00001013sv*sd*bc*sc*i*
alias: pci:v000015B3d00001012sv*sd*bc*sc*i*
alias: pci:v000015B3d00001011sv*sd*bc*sc*i*
depends: devlink,ptp,mlxfw
retpoline: Y
intree: Y
name: mlx5_core
vermagic: 4.15.0-36-generic SMP mod_unload
signat: PKCS#7
signer:
sig_key:
sig_hashalgo: md4
parm: debug_mask:debug mask: 1 = dump cmd data, 2 = dump cmd exec time, 3 = both. Default=0 (uint)
parm: prof_sel:profile selector. Valid range 0 - 2 (uint)
yao@Host1:~$ dmesg
...
[ 78.772669] nvme nvme0: queue_size 128 > ctrl maxcmd 64, clamping down
[ 78.856378] nvme nvme0: creating 8 I/O queues.
[ 88.297468] nvme nvme0: new ctrl: NQN "nqn.2018-09.com.52", addr 192.168.2.52:4420
[ 101.561197] nvme nvme1: queue_size 128 > ctrl maxcmd 64, clamping down
[ 101.644852] nvme nvme1: creating 8 I/O queues.
[ 111.083806] nvme nvme1: new ctrl: NQN "nqn.2018-09.com.9", addr 192.168.1.9:4420
[ 151.368016] nvme nvme2: queue_size 128 > ctrl maxcmd 64, clamping down
[ 151.451717] nvme nvme2: creating 8 I/O queues.
[ 160.893710] nvme nvme2: new ctrl: NQN "nqn.2018-09.com.54", addr 192.168.2.54:4420
[ 169.789368] nvme nvme3: queue_size 128 > ctrl maxcmd 64, clamping down
[ 169.873068] nvme nvme3: creating 8 I/O queues.
[ 177.657661] nvme nvme3: Connect command failed, error wo/DNR bit: -16402
[ 177.657669] nvme nvme3: failed to connect queue: 4 ret=-18
[ 177.951379] nvme nvme3: Reconnecting in 10 seconds...
[ 188.138167] general protection fault: 0000 [#1] SMP PTI
[ 188.138172] Modules linked in: nvme_rdma rdma_ucm rdma_cm nvme_fabrics nvme_core ib_ucm ib_uverbs ib_umad iw_cm ib_cm nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec aes_x86_64 crypto_simd glue_helper cryptd snd_hda_core snd_hwdep intel_cstate snd_pcm cp210x snd_seq_midi snd_seq_midi_event joydev input_leds snd_rawmidi usbserial snd_seq snd_seq_device snd_timer snd mei_me soundcore wmi_bmof hp_wmi sparse_keymap ioatdma mac_hid intel_rapl_perf mei dca intel_wmi_thunderbolt shpchp serio_raw sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 mlx5_ib ib_core amdgpu chash hid_generic usbhid hid
[ 188.138248] radeon i2c_algo_bit ttm mlx5_core drm_kms_helper syscopyarea e1000e sysfillrect mlxfw sysimgblt devlink ahci fb_sys_fops ptp psmouse drm pps_core libahci wmi
[ 188.138272] CPU: 0 PID: 390 Comm: kworker/u56:7 Not tainted 4.15.0-36-generic #39-Ubuntu
[ 188.138275] Hardware name: HP HP Z4 G4 Workstation/81C5, BIOS P62 v01.51 05/08/2018
[ 188.138283] Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
[ 188.138290] RIP: 0010:nvme_rdma_alloc_queue+0x3c/0x190 [nvme_rdma]
[ 188.138294] RSP: 0018:ffffc04c041e3e08 EFLAGS: 00010286
[ 188.138298] RAX: 0000000000000000 RBX: 890a8eecb83679a9 RCX: ffff9f9b5ec10820
[ 188.138301] RDX: ffffffffc0cd5600 RSI: ffffffffc0cd43ab RDI: ffff9f9ad037c000
[ 188.138304] RBP: ffffc04c041e3e28 R08: 000000000000020c R09: 0000000000000000
[ 188.138307] R10: 0000000000000000 R11: 000000000000020f R12: ffff9f9ad037c000
[ 188.138309] R13: 0000000000000000 R14: 0000000000000020 R15: 0000000000000000
[ 188.138313] FS: 0000000000000000(0000) GS:ffff9f9b5f200000(0000) knlGS:0000000000000000
[ 188.138316] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 188.138319] CR2: 00007f347e159fb8 CR3: 00000001a740a006 CR4: 00000000003606f0
[ 188.138323] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 188.138325] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 188.138327] Call Trace:
[ 188.138335] nvme_rdma_configure_admin_queue+0x22/0x2d0 [nvme_rdma]
[ 188.138341] nvme_rdma_reconnect_ctrl_work+0x27/0xd0 [nvme_rdma]
[ 188.138349] process_one_work+0x1de/0x410
[ 188.138354] worker_thread+0x32/0x410
[ 188.138361] kthread+0x121/0x140
[ 188.138365] ? process_one_work+0x410/0x410
[ 188.138370] ? kthread_create_worker_on_cpu+0x70/0x70
[ 188.138378] ret_from_fork+0x35/0x40
[ 188.138381] Code: 89 e5 41 56 41 55 41 54 53 48 8d 1c c5 00 00 00 00 49 89 fc 49 89 c5 49 89 d6 48 29 c3 48 c7 c2 00 56 cd c0 48 c1 e3 04 48 03 1f <48> 89 7b 18 48 8d 7b 58 c7 43 50 00 00 00 00 e8 50 05 40 ce 45
[ 188.138443] RIP: nvme_rdma_alloc_queue+0x3c/0x190 [nvme_rdma] RSP: ffffc04c041e3e08
[ 188.138447] ---[ end trace c9efe5e9bc3591f2 ]---
yao@Host1:~$ dmesg | grep mlx
[ 2.510581] mlx5_core 0000:15:00.0: enabling device (0100 -> 0102)
[ 2.510732] mlx5_core 0000:15:00.0: firmware version: 16.21.2010
[ 4.055064] mlx5_core 0000:15:00.0: Port module event: module 0, Cable plugged
[ 4.061558] mlx5_core 0000:21:00.0: enabling device (0100 -> 0102)
[ 4.061775] mlx5_core 0000:21:00.0: firmware version: 16.21.2010
[ 4.966172] mlx5_core 0000:21:00.0: Port module event: module 0, Cable plugged
[ 4.972503] mlx5_core 0000:15:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(64) RxCqeCmprss(0)
[ 5.110943] mlx5_core 0000:21:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(64) RxCqeCmprss(0)
[ 5.247925] mlx5_core 0000:15:00.0 enp21s0: renamed from eth0
[ 5.248600] mlx5_ib: Mellanox Connect-IB Infiniband driver v5.0-0
[ 5.275912] mlx5_core 0000:21:00.0 enp33s0: renamed from eth1
[ 23.736990] mlx5_core 0000:21:00.0 enp33s0: Link up
[ 23.953415] mlx5_core 0000:15:00.0 enp21s0: Link up
[ 188.138172] Modules linked in: nvme_rdma rdma_ucm rdma_cm nvme_fabrics nvme_core ib_ucm ib_uverbs ib_umad iw_cm ib_cm nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec aes_x86_64 crypto_simd glue_helper cryptd snd_hda_core snd_hwdep intel_cstate snd_pcm cp210x snd_seq_midi snd_seq_midi_event joydev input_leds snd_rawmidi usbserial snd_seq snd_seq_device snd_timer snd mei_me soundcore wmi_bmof hp_wmi sparse_keymap ioatdma mac_hid intel_rapl_perf mei dca intel_wmi_thunderbolt shpchp serio_raw sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 mlx5_ib ib_core amdgpu chash hid_generic usbhid hid
[ 188.138248] radeon i2c_algo_bit ttm mlx5_core drm_kms_helper syscopyarea e1000e sysfillrect mlxfw sysimgblt devlink ahci fb_sys_fops ptp psmouse drm pps_core libahci wmi
[ 662.506623] Modules linked in: cfg80211 nvme_rdma rdma_ucm rdma_cm nvme_fabrics nvme_core ib_ucm ib_uverbs ib_umad iw_cm ib_cm nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec aes_x86_64 crypto_simd glue_helper cryptd snd_hda_core snd_hwdep intel_cstate snd_pcm cp210x snd_seq_midi snd_seq_midi_event joydev input_leds snd_rawmidi usbserial snd_seq snd_seq_device snd_timer snd mei_me soundcore wmi_bmof hp_wmi sparse_keymap ioatdma mac_hid intel_rapl_perf mei dca intel_wmi_thunderbolt shpchp serio_raw sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 mlx5_ib ib_core amdgpu chash hid_generic
Hi Bill,
Here is the link to the Release Notes for Ubuntu 18.04 Inbox Driver. Section 2 Changes and New Features lists the support for Enhanced IPoIB for ConnectX-4 cards. Also, I am attaching the link for the User Manual for your reference. Which card are you using? Also, you mentioned you are receiving the following error [ 57.573664] ib_ipoib: unknown parameter 'ipoib_enhanced' ignored. Is this after testing with MLNX_OFED? If yes, which OFED version did you test it with?
http://www.mellanox.com/pdf/prod_software/Ubuntu_18.04_Inbox_Driver_Release_Notes.pdf
http://www.mellanox.com/pdf/prod_software/Ubuntu_18_04_Inbox_Driver_User_Manual.pdf
I am not familiar with pktgen and thus I may miss the point. But what really catches my eyes is your 52G throughput on the TX side. That's exactly the same rate I saw when I ran FIO test with ConnectX-5 and I am puzzled why it doesn't come close to the line rate (see Bad RoCEv2 throughput with ConnectX-5 ). I am using the stock Linux driver instead of Mellanox OFED.
Hi Bill,
Here is the link to the Release Notes for Ubuntu 18.04 Inbox Driver. Section 2
Changes and New Features lists the support for Enhanced IPoIB for ConnectX-4
cards. Also, I am attaching the link for the User Manual for your reference.
The documentation was quite helpful, thanks.
Which card are you using?
root@k1:~# lspci | grep -i mellanox
06:00.0 Infiniband controller: Mellanox Technologies MT27700 Family
Also, you mentioned you are receiving the following
error ib_ipoib: unknown parameter 'ipoib_enhanced' ignored. Is
this after testing with MLNX_OFED? If yes, which OFED version did you test it with?
I was tried what comes with ubuntu 18.04 and with
"mlnx-en-4.4-2.0.7.0-ubuntu18.04-x86_64". I'm open to a solution with either.
Do you have any recommendations for getting ipoib connected mode to work with
either the native ubuntu 18.04 drivers or the MLNX_OFED drivers?
I tried this because it was recommended in the Mellanox forum:
root@k2:/etc/modprobe.d# cat ib_ipoib.conf
options ib_ipoib ipoib_enhanced=0
But when I boot the above mentioned error shows up.
When I install the MLNX_OFED driver, I don't see any errors. Here's the summary:
Device #1:
From what I can tell this no longer works with Ubuntu 18.04. When I try this solution, during boot I get:
[ 4128.198929] ib_ipoib: unknown parameter 'ipoib_enhanced' ignored
I tried with the native 18.04 drivers and after installing mlnx-en-4.4-2.0.7.0-ubuntu18.04-x86_64.tgz
I had the wrong package installed, MLNX_OFED fixed the problem and connected mode now works.
Is there any way to get connected mode working with the drivers that come with Ubuntu 18.04?
Hi,
Im designing next generation back end storage network for one of our customer, where i'm recommending them to go with NVMe | RoCE protocol | 100Gig eth BW.
please can anyone help me in validating below - if connectX-4 EN is compatible with NetApp EF570?
Networking Gear:
Spine switch – Arista 7060CX2-32S
Leaf Switch – Arista 7280CR-48
Storage:
NetApp EF570 (NIC – Mallanox ConnectX®-4 EN Adapter Card Dual-Port 100 Gigabit Ethernet Adapter)
would really appreciate response at the earliest.
Nabeel
Thanks for the firmware note. I was incorrect about the firmware on the Connectx-4 being the latest. Updated to latest firmware and ethtool -m works as expected.
How to calculate Symbol Errors. I need the formula to calculate Symbol errors.
Any help?
We are planning to setup iSER interop testing with ConnectX-4 Lx on latest Windows Server 2019.
Does anyone know when the driver/fw will be released for Win2019? Thanks!
Hi ,
Please, i am a real beginner in Infiniband interconnect !, and need help please.
I want to set up infiniband FDR over HPC cluster, I haven't yet order the parts, but i am thinking to the followings :
First of all, my interest is Latency more than bandwidth, First question:
In other hand, regarding software & OS, the cluster is running very well with 10GbE SFP+ on Linux CentOS, then setting up infiniband interconnect would be more complected ( things to do and things to avoid) ?
I certainly have other questions but, at this time........
Thanks for help
BR
Hi,
Before going with RoCE. Are you able to run TCP/IP traffic (including UDP, ICMP) over VXLAN? RoCE v2 based on UDP.
Hi Mohammed,
Mellanox HCA are compatible with SPECS, protocols, etc. When asking about compatibility against external appliance it might be better to ask regarding specific features if it supported by adapter. Are you looking for RoCE support - it is supported, NVMe - it is supported. If your question is if NetApp EF570 supports specific piece of the hardware like Mellanox HCA, the question should be addressed to NetApp as they use their own version of the driver and the firmware and might have some hardware compatibility matrix.