Hi,
I have here a machine with Centos 7.5 on ARM and cannot see the same output.
I would like to investigate it even if you already a workaround.
For this purpose, I need you to open a case at support@mellanox.com
Thanks in advance
Marc
Hi,
I have here a machine with Centos 7.5 on ARM and cannot see the same output.
I would like to investigate it even if you already a workaround.
For this purpose, I need you to open a case at support@mellanox.com
Thanks in advance
Marc
Hi, have a Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 installed server with Ubuntu 16.04 and were able to install the drivers etc.,
But It card doesn't show up using ifconfig -a
Any ideas? Is this version of OS and Kernel supported for ConnectX VPI PCIe 2.0?
Here is more info:
root@ubuntu16-sdc:~# root@ubuntu16-sdc:~# uname -a
Linux ubuntu16-sdc 4.8.0-44-generic #47~16.04.1-Ubuntu SMP Wed Mar 22 18:51:56 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
root@ubuntu16-sdc:~# lspci | grep Mell
03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)
root@ubuntu16-sdc:~# /etc/init.d/openibd restart
Unloading HCA driver: [ OK ]
Loading HCA driver and Access Layer: [ OK ]
root@ubuntu16-sdc:~# hca_self_test.ofed
---- Performing Adapter Device Self Test ----
Number of CAs Detected ................. 1
PCI Device Check ....................... PASS
Kernel Arch ............................ x86_64
Host Driver Version .................... MLNX_OFED_LINUX-4.4-1.0.0.0 (OFED-4.4-1.0.0): 4.8.0-44-generic
Host Driver RPM Check .................. PASS
Firmware on CA #0 HCA .................. v2.10.0720
Host Driver Initialization ............. PASS
Number of CA Ports Active .............. 0
Kernel Syslog Check .................... PASS
Node GUID on CA #0 (HCA) ............... NA
------------------ DONE ---------------------
root@ubuntu16-sdc:~# mlxfwmanager --online -u -d 0000:03:00.0
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX2
Part Number: MHQH19B-XTR_A1-A3
Description: ConnectX-2 VPI adapter card; single-port 40Gb/s QSFP; PCIe2.0 x8 5.0GT/s; tall bracket; RoHS R6
PSID: MT_0D90110009
PCI Device Name: 0000:03:00.0
Port1 MAC: 0002c94f2ec0
Port2 MAC: 0002c94f2ec1
Versions: Current Available
FW 2.10.0720 N/A
Status: No matching image found
root@ubuntu16-sdc:~# lsmod | grep ib
ib_ucm 20480 0
ib_ipoib 172032 0
ib_cm 53248 3 rdma_cm,ib_ipoib,ib_ucm
ib_uverbs 106496 2 ib_ucm,rdma_ucm
ib_umad 24576 0
mlx5_ib 270336 0
mlx5_core 806912 2 mlx5_fpga_tools,mlx5_ib
mlx4_ib 212992 0
ib_core 286720 10 ib_cm,rdma_cm,ib_umad,ib_uverbs,ib_ipoib,iw_cm,mlx5_ib,ib_ucm,rdma_ucm,mlx4_ib
mlx4_core 348160 2 mlx4_en,mlx4_ib
mlx_compat 20480 15 ib_cm,rdma_cm,ib_umad,ib_core,mlx5_fpga_tools,ib_uverbs,mlx4_en,ib_ipoib,mlx5_core,iw_cm,mlx5_ib,mlx4_core,ib_ucm,rdma_ucm,mlx4_ib
devlink 28672 4 mlx4_en,mlx5_core,mlx4_core,mlx4_ib
libfc 114688 1 tcm_fc
libcomposite 65536 2 usb_f_tcm,tcm_usb_gadget
udc_core 53248 2 usb_f_tcm,libcomposite
scsi_transport_fc 61440 3 qla2xxx,tcm_qla2xxx,libfc
target_core_iblock 20480 0
target_core_mod 356352 9 iscsi_target_mod,usb_f_tcm,vhost_scsi,target_core_iblock,tcm_loop,tcm_qla2xxx,target_core_file,target_core_pscsi,tcm_fc
configfs 40960 6 rdma_cm,iscsi_target_mod,usb_f_tcm,target_core_mod,libcomposite
libiscsi_tcp 24576 1 iscsi_tcp
libiscsi 53248 2 libiscsi_tcp,iscsi_tcp
scsi_transport_iscsi 98304 3 libiscsi,iscsi_tcp
Please let me know if any other info needed..
Hi there,
first of all thanks for all the great information that can be found here.
I'm trying to build a fully redundant setup for two racks in different locations with the smallest amount of switches. From my understanding MLAG will help me to keep redundancy towards the servers. That means for each rack:
So I need at least 4 switches. Now to my problem. I want to avoid another two spine switches. For my setup (two racks) this seems be a little overpowered and the spines would only use 4-6 ports each. The question mark in the picture shows where the magic must happen.
Looking at other posts I see the following option.
Is this ok or am I missing something?
Thanks in advance.
Hello,
(Not sure if this is the right place to ask, if not , please kindly point out the right place or person)
I have a connect-3 CX354a installed on my Window Server 2016, and enabled SR-IOV on the card and the server, following the WinOF user guide.
On the ubuntu 18.02 VM created using MS Hyperv-v, I can see the mlnx VF working properly. But when I tried to run testpmd on the VM using the following command:
./testpmd -l 0-1 -n 4 --vdev=net_vdev_netvsc0,iface=eth1,force=1 -w 0002:00:02.0 --vdev=net_vdev_netvsc0,iface=eth2,force=1 -w 0003:00:02.0 -- --rxq=2 --txq=2 -i
I ran into an error:
PMD: mlx4.c:138: mlx4_dev_start(): 0x562bff05e040: cannot attach flow rules (code 12, "Cannot allocate memory"), flow error type 2, cause 0x7f39ef408780, message: flow rule rejected by device
The command format was suggested by MS Azure to run DPDK on their AN-enabled VMs. And it does work on Azure VMs.
Here is ibv_devinfo output from my vm:
root@myVM:~/MLNX_OFED_SRC-4.4-1.0.0.0# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.42.5000
node_guid: 0014:0500:691f:c3fa
sys_image_guid: ec0d:9a03:001c:92e3
vendor_id: 0x02c9
vendor_part_id: 4100
hw_ver: 0x0
board_id: MT_1090120019
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
hca_id: mlx4_1
transport: InfiniBand (0)
fw_ver: 2.42.5000
node_guid: 0014:0500:76e0:b9d1
sys_image_guid: ec0d:9a03:001c:92e3
vendor_id: 0x02c9
vendor_part_id: 4100
hw_ver: 0x0
board_id: MT_1090120019
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
and here is the kernel module info of mlx4_en:
filename: | /lib/modules/4.15.0-23-generic/updates/dkms/mlx4_en.ko |
version: | 4.4-1.0.0 |
license: | Dual BSD/GPL |
description: | Mellanox ConnectX HCA Ethernet driver |
author: | Liran Liss, Yevgeny Petrilin |
srcversion: | 23E8E7A25194AE68387DC95 |
depends: | mlx4_core,mlx_compat,ptp,devlink |
retpoline: | Y |
name: | mlx4_en |
vermagic: | 4.15.0-23-generic SMP mod_unload |
parm: | udev_dev_port_dev_id:Work with dev_id or dev_port when supported by the kernel. Range: 0 <= udev_dev_port_dev_id <= 2 (default = 0). |
0: Work with dev_port if supported by the kernel, otherwise work with dev_id. | |
1: Work only with dev_id regardless of dev_port support. | |
2: Work with both of dev_id and dev_port (if dev_port is supported by the kernel). (int) | |
parm: | udp_rss:Enable RSS for incoming UDP traffic or disabled (0) (uint) |
parm: | pfctx:Priority based Flow Control policy on TX[7:0]. Per priority bit mask (uint) |
parm: | pfcrx:Priority based Flow Control policy on RX[7:0]. Per priority bit mask (uint) |
parm: | inline_thold:Threshold for using inline data (range: 17-104, default: 104) (uint) |
One difference I notice between my local VM and Azure VM is that the mlx4_en.ko module is definitely different. Azure seems to be using a specialized version of mlx4_en. Is this the reason why the testpmd works on azure but not my local Hyper-v VM?
If so, how can I get a DPDK-capable driver for MS Hyper-v?
Thank you!
Hi,
vlan tagged traffic is not required - but I do need it in a vlan
Sorry for late reply but it took me longer then expected to get back to this.
Anyway I think I solved this. Apparently after earlier learning experience I ended up with mismatch DCBX config between card and Windows. I had it enabled in firmware:
LLDP_NB_DCBX_P1
LLDP_NB_RX_MODE_P1
LLDP_NB_TX_MODE_P1
and disabled in windows.
After resetting card firmware settings to default (DCBX not listening to switch) it's ok now even with 1.80 driver.
For resetting my card: mlxconfig.exe -d mt4115_pciconf0 reset
Thanks for suggestions.
Greetings,
I'm working with an embedded build of the rdma-core code (and rdma-perftest but I'm not that far yet). We're doing a cross build of the rdma-core code using yocto targeted at an Altera Arria10 board which includes a dual-core ARM Cortex-A9 processor. I've been able to successfully build the kernel 4.15 with the rxe modules. However, when I build the userland and get to the rdma-core library I run into a number of issues. One has been particularly vexing. During the do_configure() stage of the yocto build for rdma-core, I get errors regarding the installation of the rdma_man_pages. In particular, in the buildlib/rdma_man.cmake file, there is a routine: function(rdma_man_pages) that fails. If I comment out the entire body of this function, I can get the binaries to build but that takes me to another problem during the do_install portion. It appears that pandoc is not available at this stage. I suppose I can try to add pandoc with a separate recipe and then try again. Anyone have any comments on this build issue?
Thanks,
FM
I am using CX3 and CX4 NICs to measure the throughput of RDMA verbs (RC Read and UD Send/Recv).
When I use the same test code to measure the peak throughput of small messages on CX3 and CX4.
The performance of RDMA Read Verbs is lower than Send/Recv Verbs on CX3, while the comparison result is reversed on CX4.
How about the performance trend of new generation of NICs like CX5 or CX6?
Hello, I met a problem when I set the trust-mode for the ConnectX3-Pro 40GbE NIC.
The system information follows:
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.3.1611 (Core)
Release: 7.3.1611
Codename: Core
The ConnectX3-Pro NIC information follows:
hca_id: mlx4_1
transport: InfiniBand (0)
fw_ver: 2.40.7000
node_guid: f452:1403:0095:2280
sys_image_guid: f452:1403:0095:2280
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: MT_1090111023
phys_port_cnt: 2
Device ports:
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
port: 2
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
It's the first time that I have met this problem. So I don't know what to do.
What does this tip mean? Is the system version the main cause?
Waiting for your help.
Thanks.
Server:
> ib_send_bw -a -F --report_gbits
Client:
> ib_send_bw -a -F --report_gbits <serverIP>
Please let me know your results and thank you...
~Steve
Hi Steve,
Here you are. Thanks for taking an interest. Still getting ~45 Gb/sec on both client and server. Here is the client output:
[root@vx01 ~]# ib_send_bw -a -F --report_gbits vx02
---------------------------------------------------------------------------------------
Send BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
TX depth : 128
CQ Moderation : 100
Mtu : 4096[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x3e4 QPN 0x005e PSN 0x239861
remote address: LID 0x3e6 QPN 0x004c PSN 0xed4513
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
2 1000 0.098220 0.091887 5.742968
4 1000 0.20 0.19 6.037776
8 1000 0.40 0.39 6.071169
16 1000 0.78 0.67 5.220818
32 1000 1.53 1.43 5.576730
64 1000 3.16 3.10 6.053410
128 1000 6.20 6.16 6.012284
256 1000 12.35 12.28 5.997002
512 1000 22.67 22.47 5.486812
1024 1000 38.02 36.69 4.478158
2048 1000 42.26 42.04 2.565771
4096 1000 43.82 43.68 1.332978
8192 1000 44.63 44.63 0.681005
16384 1000 44.79 44.79 0.341728
32768 1000 45.21 45.21 0.172449
65536 1000 45.35 45.35 0.086506
131072 1000 45.45 45.45 0.043342
262144 1000 45.45 45.45 0.021670
524288 1000 45.47 45.47 0.010840
1048576 1000 45.47 45.47 0.005421
2097152 1000 45.48 45.48 0.002711
4194304 1000 45.48 45.48 0.001355
8388608 1000 45.48 45.48 0.000678
---------------------------------------------------------------------------------------
Here is the server output:
[root@vx02 ~]# ib_send_bw -a -F --report_gbits
************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
Send BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
RX depth : 512
CQ Moderation : 100
Mtu : 4096[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x3e6 QPN 0x004c PSN 0xed4513
remote address: LID 0x3e4 QPN 0x005e PSN 0x239861
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
2 1000 0.000000 0.099141 6.196311
4 1000 0.00 0.20 6.229974
8 1000 0.00 0.40 6.265230
16 1000 0.00 0.69 5.362016
32 1000 0.00 1.47 5.727960
64 1000 0.00 3.22 6.283794
128 1000 0.00 6.34 6.191118
256 1000 0.00 12.64 6.169975
512 1000 0.00 23.08 5.634221
1024 1000 0.00 37.53 4.581582
2048 1000 0.00 42.63 2.602155
4096 1000 0.00 44.07 1.344970
8192 1000 0.00 45.04 0.687191
16384 1000 0.00 45.04 0.343602
32768 1000 0.00 45.35 0.172994
65536 1000 0.00 45.45 0.086690
131072 1000 0.00 45.52 0.043409
262144 1000 0.00 45.51 0.021699
524288 1000 0.00 45.52 0.010852
1048576 1000 0.00 45.52 0.005427
2097152 1000 0.00 45.53 0.002714
4194304 1000 0.00 45.53 0.001357
8388608 1000 0.00 45.53 0.000678
---------------------------------------------------------------------------------------
Please let me know if you want any other info and I will send it straight away.
Regards,
Eric
Hi,
Is ASAP2 OVS Offload support OpenStack Live Migration? If not, which ASAP2 mode should I use, OVS Acceleration or Application Acceleration (DPDK Offload)? How to have H/W LAG (with LACP) on each of those three modes?
Best regards,
Hello.
I have Microsoft Windows 2012 R2 cluster and some nodes have ConnectX-3 adapters and some nodes have ConnectX-4 Lx adapters.
There is RoCE connectivity between nodes with ConnectX-4 Lx adapters, but there isn’t connectivity between nodes with different adapters.
I think it’s because ConnectX-3 adapters use RoCE 1.0 mode, but ConnectX-4 Lx adapters use RoCE 2.0 mode.
I tred to change RoCE mode from 1.0 to 2.0 for ConnectX-3 adapters by “Set-MlnxDriverCoreSetting -RoceMode 2”, but had warning – “SingleFunc_2_0_0: RoCE v2.0 mode was requested, but it is not supported. The NIC starts in RoCE v1.5 mode” and RoCE connectivity doesn’t work.
What the best way to fix my problem ?
ConnectX-3 adapters don’t work in RoCE 2.0 mode at all ? How about new FW ? Today I have WinOF-5_35 with FW:
Image type: FS2
FW Version: 2.40.5032
FW Release Date: 16.1.2017
Product Version: 02.40.50.32
Rom Info: type=PXE version=3.4.747 devid=4099
Device ID: 4099
Description: Node Port1 Port2 Sys image
GUIDs: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
MACs: e41d2ddfa540 e41d2ddfa541
VSD:
PSID: MT_1080120023
I don’t find way to change to RoCE 1.0 mode for ConnectX-4 Lx adapters for Microsoft Windows environment.
Hi all,
After installing MLNX_OFED_LINUX-4.4-1 on Ubuntu 18.04 (kernel 4.15.0-24) as "$ mlnxofedinstall --force --without-dkms --with-nvmf" I'm trying to use RDMA tools, but
- modprobe on nvme_rdma fails with "nvme_rdma: Unknown symbol nvme_delete_wq (err 0)"
- modprobe on nvmet_rdma fails with "nvmet: Unknown symbol nvme_find_pdev_from_bdev (err 0)"
What am I doing wrong, please?
I see two kernel modules loaded: nvme and nvme_core.
This is Mellanox MCX516A-CCAT ConnectX-5 EN Network Interface Card 100GbE Dual-Port QSFP28.
Any inputs will be greatly appreciated
Thank you
Dmitri Fedorov
Ciena Canada
> Service for this Hardware and FW has ended. So it will not be hosted on our site.
Thank you for your help. Seems this new firmware is no newer than the one I already have, unfortunately.
Those are 10+ years old cards... no surprise they are difficult to put back in service.
Cordially & thanks again for the great support,
I have a limited budget, and I want to buy a high speed network interface cards and a switch to connect multiple PCs into a local network (just a couple of meters away from one another) for HPC research.
Some of the key points for me are:
Taking all of that into account what models should I look to buy? Perhaps some older models from a couple of years ago?
What are some caveats in building a system like that that I should consider?
What should I expect in terms of latency?
Hi All,
I don't know if someone already posted something about this.
I just want to share that when we upgraded our SN2100 to v3.6.8004, the GUI doesn't load. Luckily it can still be accessible via SSH and I changed the next boot partition and it rebooted fine. I downgraded the partition from a working version and now we can manage the switch both via GUI and CLI.
Hope this helps someone who wants to upgrade to this version.
I have encountered this question, too.
It was because of the ucx do not compile with cuda.(The mlnx install the default ucx).
When I recompile the ucx with cuda and reinstall it ,It works.
I've made a feeble attempt to utilise Header Data Split (HDS) offload on Connect-X 5 adapters, by creating the striding WQ context with a non-zero log2_hds_buf_size value. However, the hardware won't have it and reports back bad_param error with syndrome 0x6aaebb.
According to an online error syndrome list, this translates to human readable as:
create_rq/rmp: log2_hds_buf_size not supported
Since the Public PRM does not describe HDS offload, I'm curious to whether certain preconditions need to met for this offload to work, or if this is a known restriction in current firmware? I'd also like to know if it's possible to configure the HDS "level", that is where the split happens (between L3/L4, L4/L5, ...).
The way I'd envision this feature to work is to zero-pad the end of headers up to log2_hds_buf_size, placing the upper layer payload at a fixed offset for any variable-size header length.
Hello Eric -
I hope all is well...
You won't achieve a line rate of 56G/s because the NIC is: MCB193A-FCAT MT_1220110019 and your PCEi is 2.0
And the release notes for you FW state:
Connect-IB® Host Channel Adapter, single-port QSFP, FDR 56Gb/s,PCIe3.0 x16, tall bracket, RoHS R6
So getting ~45 ~48 Gb/s is good.
Have a great day!
Steve
is it possible to configure RoCE v2 with Connectx-4 card without MLNX_OFED? can someone please share info if there is any guide/doc available to configure with Linux drivers and packages?
I tried to do with drivers and packages but I am not able to succeed. When I used MLNX_OFED, RoCE is configured successfully.