Re: mlnxofedinstall of 4.3-3.0.2.1-rhel7.5alternate-aarch64 has some checking bug need to be fixed

July 12, 2018, 4:57 am

≫ Next: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 On Ubuntu 16.04

≪ Previous: Re: How to enable VF multi-queue for SR-IOV on KVM?

Hi,

I have here a machine with Centos 7.5 on ARM and cannot see the same output.

I would like to investigate it even if you already a workaround.

For this purpose, I need you to open a case at support@mellanox.com

Thanks in advance

Marc

↧

Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 On Ubuntu 16.04

July 12, 2018, 3:56 pm

≫ Next: Small redundant MLAG setup

≪ Previous: Re: mlnxofedinstall of 4.3-3.0.2.1-rhel7.5alternate-aarch64 has some checking bug need to be fixed

Hi, have a Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 installed server with Ubuntu 16.04 and were able to install the drivers etc.,

But It card doesn't show up using ifconfig -a

Any ideas? Is this version of OS and Kernel supported for ConnectX VPI PCIe 2.0?

Here is more info:

root@ubuntu16-sdc:~# root@ubuntu16-sdc:~# uname -a

Linux ubuntu16-sdc 4.8.0-44-generic #47~16.04.1-Ubuntu SMP Wed Mar 22 18:51:56 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

root@ubuntu16-sdc:~# lspci | grep Mell

03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)

root@ubuntu16-sdc:~# /etc/init.d/openibd restart

Unloading HCA driver: [ OK ]

Loading HCA driver and Access Layer: [ OK ]

root@ubuntu16-sdc:~# hca_self_test.ofed

---- Performing Adapter Device Self Test ----

Number of CAs Detected ................. 1

PCI Device Check ....................... PASS

Kernel Arch ............................ x86_64

Host Driver Version .................... MLNX_OFED_LINUX-4.4-1.0.0.0 (OFED-4.4-1.0.0): 4.8.0-44-generic

Host Driver RPM Check .................. PASS

Firmware on CA #0 HCA .................. v2.10.0720

Host Driver Initialization ............. PASS

Number of CA Ports Active .............. 0

Kernel Syslog Check .................... PASS

Node GUID on CA #0 (HCA) ............... NA

------------------ DONE ---------------------

root@ubuntu16-sdc:~# mlxfwmanager --online -u -d 0000:03:00.0

Querying Mellanox devices firmware ...

Device #1:

----------

Device Type: ConnectX2

Part Number: MHQH19B-XTR_A1-A3

Description: ConnectX-2 VPI adapter card; single-port 40Gb/s QSFP; PCIe2.0 x8 5.0GT/s; tall bracket; RoHS R6

PSID: MT_0D90110009

PCI Device Name: 0000:03:00.0

Port1 MAC: 0002c94f2ec0

Port2 MAC: 0002c94f2ec1

Versions: Current Available

FW 2.10.0720 N/A

Status: No matching image found

root@ubuntu16-sdc:~# lsmod | grep ib

ib_ucm 20480 0

ib_ipoib 172032 0

ib_cm 53248 3 rdma_cm,ib_ipoib,ib_ucm

ib_uverbs 106496 2 ib_ucm,rdma_ucm

ib_umad 24576 0

mlx5_ib 270336 0

mlx5_core 806912 2 mlx5_fpga_tools,mlx5_ib

mlx4_ib 212992 0

ib_core 286720 10 ib_cm,rdma_cm,ib_umad,ib_uverbs,ib_ipoib,iw_cm,mlx5_ib,ib_ucm,rdma_ucm,mlx4_ib

mlx4_core 348160 2 mlx4_en,mlx4_ib

mlx_compat 20480 15 ib_cm,rdma_cm,ib_umad,ib_core,mlx5_fpga_tools,ib_uverbs,mlx4_en,ib_ipoib,mlx5_core,iw_cm,mlx5_ib,mlx4_core,ib_ucm,rdma_ucm,mlx4_ib

devlink 28672 4 mlx4_en,mlx5_core,mlx4_core,mlx4_ib

libfc 114688 1 tcm_fc

libcomposite 65536 2 usb_f_tcm,tcm_usb_gadget

udc_core 53248 2 usb_f_tcm,libcomposite

scsi_transport_fc 61440 3 qla2xxx,tcm_qla2xxx,libfc

target_core_iblock 20480 0

target_core_mod 356352 9 iscsi_target_mod,usb_f_tcm,vhost_scsi,target_core_iblock,tcm_loop,tcm_qla2xxx,target_core_file,target_core_pscsi,tcm_fc

configfs 40960 6 rdma_cm,iscsi_target_mod,usb_f_tcm,target_core_mod,libcomposite

libiscsi_tcp 24576 1 iscsi_tcp

libiscsi 53248 2 libiscsi_tcp,iscsi_tcp

scsi_transport_iscsi 98304 3 libiscsi,iscsi_tcp

Please let me know if any other info needed..

↧

Small redundant MLAG setup

July 15, 2018, 3:46 am

≫ Next: DPDK with MLX4 VF on Hyper-v VM

≪ Previous: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 On Ubuntu 16.04

Hi there,

first of all thanks for all the great information that can be found here.

I'm trying to build a fully redundant setup for two racks in different locations with the smallest amount of switches. From my understanding MLAG will help me to keep redundancy towards the servers. That means for each rack:

use two switches to create a mlag domain
attach all servers to both switches.

So I need at least 4 switches. Now to my problem. I want to avoid another two spine switches. For my setup (two racks) this seems be a little overpowered and the spines would only use 4-6 ports each. The question mark in the picture shows where the magic must happen.

Looking at other posts I see the following option.

With MLNX-OS 3.6.6102 STP and MLAG can coexist.
I implement a fully redundant interconnect of all 4 switches.
I activate MSTP (as I have multiple VLANs)
MSTP will allow to utilize interconnection as good as possible

Is this ok or am I missing something?

Thanks in advance.

↧

DPDK with MLX4 VF on Hyper-v VM

July 16, 2018, 12:20 am

≫ Next: Re: Windows RDMA QoS and WinOF-2 1.80 issues

≪ Previous: Small redundant MLAG setup

Hello,

(Not sure if this is the right place to ask, if not , please kindly point out the right place or person)

I have a connect-3 CX354a installed on my Window Server 2016, and enabled SR-IOV on the card and the server, following the WinOF user guide.

On the ubuntu 18.02 VM created using MS Hyperv-v, I can see the mlnx VF working properly. But when I tried to run testpmd on the VM using the following command:

./testpmd -l 0-1 -n 4 --vdev=net_vdev_netvsc0,iface=eth1,force=1 -w 0002:00:02.0 --vdev=net_vdev_netvsc0,iface=eth2,force=1 -w 0003:00:02.0 -- --rxq=2 --txq=2 -i

I ran into an error:

PMD: mlx4.c:138: mlx4_dev_start(): 0x562bff05e040: cannot attach flow rules (code 12, "Cannot allocate memory"), flow error type 2, cause 0x7f39ef408780, message: flow rule rejected by device

The command format was suggested by MS Azure to run DPDK on their AN-enabled VMs. And it does work on Azure VMs.

Here is ibv_devinfo output from my vm:

root@myVM:~/MLNX_OFED_SRC-4.4-1.0.0.0# ibv_devinfo

hca_id: mlx4_0

transport: InfiniBand (0)

fw_ver: 2.42.5000

node_guid: 0014:0500:691f:c3fa

sys_image_guid: ec0d:9a03:001c:92e3

vendor_id: 0x02c9

vendor_part_id: 4100

hw_ver: 0x0

board_id: MT_1090120019

phys_port_cnt: 1

port: 1

state: PORT_DOWN (1)

max_mtu: 4096 (5)

active_mtu: 1024 (3)

sm_lid: 0

port_lid: 0

port_lmc: 0x00

link_layer: Ethernet

hca_id: mlx4_1

transport: InfiniBand (0)

fw_ver: 2.42.5000

node_guid: 0014:0500:76e0:b9d1

sys_image_guid: ec0d:9a03:001c:92e3

vendor_id: 0x02c9

vendor_part_id: 4100

hw_ver: 0x0

board_id: MT_1090120019

phys_port_cnt: 1

port: 1

state: PORT_DOWN (1)

max_mtu: 4096 (5)

active_mtu: 1024 (3)

sm_lid: 0

port_lid: 0

port_lmc: 0x00

link_layer: Ethernet

and here is the kernel module info of mlx4_en:

filename:	/lib/modules/4.15.0-23-generic/updates/dkms/mlx4_en.ko
version:	4.4-1.0.0
license:	Dual BSD/GPL
description:	Mellanox ConnectX HCA Ethernet driver
author:	Liran Liss, Yevgeny Petrilin
srcversion:	23E8E7A25194AE68387DC95
depends:	mlx4_core,mlx_compat,ptp,devlink
retpoline:	Y
name:	mlx4_en
vermagic:	4.15.0-23-generic SMP mod_unload
parm:	udev_dev_port_dev_id:Work with dev_id or dev_port when supported by the kernel. Range: 0 <= udev_dev_port_dev_id <= 2 (default = 0).
	0: Work with dev_port if supported by the kernel, otherwise work with dev_id.
	1: Work only with dev_id regardless of dev_port support.
	2: Work with both of dev_id and dev_port (if dev_port is supported by the kernel). (int)
parm:	udp_rss:Enable RSS for incoming UDP traffic or disabled (0) (uint)
parm:	pfctx:Priority based Flow Control policy on TX[7:0]. Per priority bit mask (uint)
parm:	pfcrx:Priority based Flow Control policy on RX[7:0]. Per priority bit mask (uint)
parm:	inline_thold:Threshold for using inline data (range: 17-104, default: 104) (uint)

One difference I notice between my local VM and Azure VM is that the mlx4_en.ko module is definitely different. Azure seems to be using a specialized version of mlx4_en. Is this the reason why the testpmd works on azure but not my local Hyper-v VM?

If so, how can I get a DPDK-capable driver for MS Hyper-v?

Thank you!

↧

Re: Windows RDMA QoS and WinOF-2 1.80 issues

July 16, 2018, 7:40 am

≫ Next: Yocto embedded build of rdma-core

≪ Previous: DPDK with MLX4 VF on Hyper-v VM

Hi,

vlan tagged traffic is not required - but I do need it in a vlan

Sorry for late reply but it took me longer then expected to get back to this.

Anyway I think I solved this. Apparently after earlier learning experience I ended up with mismatch DCBX config between card and Windows. I had it enabled in firmware:

LLDP_NB_DCBX_P1

LLDP_NB_RX_MODE_P1

LLDP_NB_TX_MODE_P1

and disabled in windows.

After resetting card firmware settings to default (DCBX not listening to switch) it's ok now even with 1.80 driver.

For resetting my card: mlxconfig.exe -d mt4115_pciconf0 reset

Thanks for suggestions.

↧

Yocto embedded build of rdma-core

July 16, 2018, 7:43 am

≫ Next: Question about RC (read) and UD (send/recv) performance over CX3 and CX4 NIC

≪ Previous: Re: Windows RDMA QoS and WinOF-2 1.80 issues

Greetings,

I'm working with an embedded build of the rdma-core code (and rdma-perftest but I'm not that far yet). We're doing a cross build of the rdma-core code using yocto targeted at an Altera Arria10 board which includes a dual-core ARM Cortex-A9 processor. I've been able to successfully build the kernel 4.15 with the rxe modules. However, when I build the userland and get to the rdma-core library I run into a number of issues. One has been particularly vexing. During the do_configure() stage of the yocto build for rdma-core, I get errors regarding the installation of the rdma_man_pages. In particular, in the buildlib/rdma_man.cmake file, there is a routine: function(rdma_man_pages) that fails. If I comment out the entire body of this function, I can get the binaries to build but that takes me to another problem during the do_install portion. It appears that pandoc is not available at this stage. I suppose I can try to add pandoc with a separate recipe and then try again. Anyone have any comments on this build issue?

Thanks,

↧

Question about RC (read) and UD (send/recv) performance over CX3 and CX4 NIC

July 16, 2018, 9:47 am

≫ Next: "Priority trust-mode is not supported on your system"?

≪ Previous: Yocto embedded build of rdma-core

I am using CX3 and CX4 NICs to measure the throughput of RDMA verbs (RC Read and UD Send/Recv).

When I use the same test code to measure the peak throughput of small messages on CX3 and CX4.

The performance of RDMA Read Verbs is lower than Send/Recv Verbs on CX3, while the comparison result is reversed on CX4.

How about the performance trend of new generation of NICs like CX5 or CX6?

↧

"Priority trust-mode is not supported on your system"?

July 17, 2018, 6:41 am

≫ Next: Re: Can't get full FDR bandwidth with Connect-IB card in PCI 2.0 x16 slot

≪ Previous: Question about RC (read) and UD (send/recv) performance over CX3 and CX4 NIC

Hello, I met a problem when I set the trust-mode for the ConnectX3-Pro 40GbE NIC.

The system information follows:

LSB Version: :core-4.1-amd64:core-4.1-noarch

Distributor ID: CentOS

Description: CentOS Linux release 7.3.1611 (Core)

Release: 7.3.1611

Codename: Core

The ConnectX3-Pro NIC information follows:

hca_id: mlx4_1

transport: InfiniBand (0)

fw_ver: 2.40.7000

node_guid: f452:1403:0095:2280

sys_image_guid: f452:1403:0095:2280

vendor_id: 0x02c9

vendor_part_id: 4103

hw_ver: 0x0

board_id: MT_1090111023

phys_port_cnt: 2

Device ports:

port: 1

state: PORT_ACTIVE (4)

max_mtu: 4096 (5)

active_mtu: 1024 (3)

sm_lid: 0

port_lid: 0

port_lmc: 0x00

link_layer: Ethernet

port: 2

state: PORT_DOWN (1)

max_mtu: 4096 (5)

active_mtu: 1024 (3)

sm_lid: 0

port_lid: 0

port_lmc: 0x00

link_layer: Ethernet

It's the first time that I have met this problem. So I don't know what to do.

What does this tip mean? Is the system version the main cause?

Waiting for your help.

Thanks.

↧

Re: Can't get full FDR bandwidth with Connect-IB card in PCI 2.0 x16 slot

July 17, 2018, 11:56 am

≫ Next: Re: Can't get full FDR bandwidth with Connect-IB card in PCI 2.0 x16 slot

≪ Previous: "Priority trust-mode is not supported on your system"?

Server:
> ib_send_bw -a -F --report_gbits
Client:
> ib_send_bw -a -F --report_gbits <serverIP>

Please let me know your results and thank you...

~Steve

↧

Re: Can't get full FDR bandwidth with Connect-IB card in PCI 2.0 x16 slot

July 17, 2018, 12:24 pm

≫ Next: ASAP2 Live Migration & H/W LAG

≪ Previous: Re: Can't get full FDR bandwidth with Connect-IB card in PCI 2.0 x16 slot

Hi Steve,

Here you are. Thanks for taking an interest. Still getting ~45 Gb/sec on both client and server. Here is the client output:

[root@vx01 ~]# ib_send_bw -a -F --report_gbits vx02

---------------------------------------------------------------------------------------

Send BW Test

Dual-port : OFF Device : mlx5_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

TX depth : 128

CQ Moderation : 100

Mtu : 4096[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

local address: LID 0x3e4 QPN 0x005e PSN 0x239861

remote address: LID 0x3e6 QPN 0x004c PSN 0xed4513

---------------------------------------------------------------------------------------

#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]

2 1000 0.098220 0.091887 5.742968

4 1000 0.20 0.19 6.037776

8 1000 0.40 0.39 6.071169

16 1000 0.78 0.67 5.220818

32 1000 1.53 1.43 5.576730

64 1000 3.16 3.10 6.053410

128 1000 6.20 6.16 6.012284

256 1000 12.35 12.28 5.997002

512 1000 22.67 22.47 5.486812

1024 1000 38.02 36.69 4.478158

2048 1000 42.26 42.04 2.565771

4096 1000 43.82 43.68 1.332978

8192 1000 44.63 44.63 0.681005

16384 1000 44.79 44.79 0.341728

32768 1000 45.21 45.21 0.172449

65536 1000 45.35 45.35 0.086506

131072 1000 45.45 45.45 0.043342

262144 1000 45.45 45.45 0.021670

524288 1000 45.47 45.47 0.010840

1048576 1000 45.47 45.47 0.005421

2097152 1000 45.48 45.48 0.002711

4194304 1000 45.48 45.48 0.001355

8388608 1000 45.48 45.48 0.000678

---------------------------------------------------------------------------------------

Here is the server output:

[root@vx02 ~]# ib_send_bw -a -F --report_gbits

************************************

* Waiting for client to connect... *

************************************

---------------------------------------------------------------------------------------

Send BW Test

Dual-port : OFF Device : mlx5_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

RX depth : 512

CQ Moderation : 100

Mtu : 4096[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

local address: LID 0x3e6 QPN 0x004c PSN 0xed4513

remote address: LID 0x3e4 QPN 0x005e PSN 0x239861

---------------------------------------------------------------------------------------

#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]

2 1000 0.000000 0.099141 6.196311

4 1000 0.00 0.20 6.229974

8 1000 0.00 0.40 6.265230

16 1000 0.00 0.69 5.362016

32 1000 0.00 1.47 5.727960

64 1000 0.00 3.22 6.283794

128 1000 0.00 6.34 6.191118

256 1000 0.00 12.64 6.169975

512 1000 0.00 23.08 5.634221

1024 1000 0.00 37.53 4.581582

2048 1000 0.00 42.63 2.602155

4096 1000 0.00 44.07 1.344970

8192 1000 0.00 45.04 0.687191

16384 1000 0.00 45.04 0.343602

32768 1000 0.00 45.35 0.172994

65536 1000 0.00 45.45 0.086690

131072 1000 0.00 45.52 0.043409

262144 1000 0.00 45.51 0.021699

524288 1000 0.00 45.52 0.010852

1048576 1000 0.00 45.52 0.005427

2097152 1000 0.00 45.53 0.002714

4194304 1000 0.00 45.53 0.001357

8388608 1000 0.00 45.53 0.000678

---------------------------------------------------------------------------------------

Please let me know if you want any other info and I will send it straight away.

Regards,

Eric

↧

ASAP2 Live Migration & H/W LAG

July 17, 2018, 1:22 pm

≫ Next: The problem with RoCE connectivity between ConnectX-3 and ConnectX-4 Lx adapters

≪ Previous: Re: Can't get full FDR bandwidth with Connect-IB card in PCI 2.0 x16 slot

Hi,

Is ASAP2 OVS Offload support OpenStack Live Migration? If not, which ASAP2 mode should I use, OVS Acceleration or Application Acceleration (DPDK Offload)? How to have H/W LAG (with LACP) on each of those three modes?

Best regards,

↧

The problem with RoCE connectivity between ConnectX-3 and ConnectX-4 Lx adapters

July 18, 2018, 6:45 am

≫ Next: Unknown symbol nvme_find_pdev_from_bdev

≪ Previous: ASAP2 Live Migration & H/W LAG

Hello.

I have Microsoft Windows 2012 R2 cluster and some nodes have ConnectX-3 adapters and some nodes have ConnectX-4 Lx adapters.

There is RoCE connectivity between nodes with ConnectX-4 Lx adapters, but there isn’t connectivity between nodes with different adapters.

I think it’s because ConnectX-3 adapters use RoCE 1.0 mode, but ConnectX-4 Lx adapters use RoCE 2.0 mode.

I tred to change RoCE mode from 1.0 to 2.0 for ConnectX-3 adapters by “Set-MlnxDriverCoreSetting -RoceMode 2”, but had warning – “SingleFunc_2_0_0: RoCE v2.0 mode was requested, but it is not supported. The NIC starts in RoCE v1.5 mode” and RoCE connectivity doesn’t work.

What the best way to fix my problem ?

ConnectX-3 adapters don’t work in RoCE 2.0 mode at all ? How about new FW ? Today I have WinOF-5_35 with FW:

Image type: FS2

FW Version: 2.40.5032

FW Release Date: 16.1.2017

Product Version: 02.40.50.32

Rom Info: type=PXE version=3.4.747 devid=4099

Device ID: 4099

Description: Node Port1 Port2 Sys image

GUIDs: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff

MACs: e41d2ddfa540 e41d2ddfa541

VSD:

PSID: MT_1080120023

I don’t find way to change to RoCE 1.0 mode for ConnectX-4 Lx adapters for Microsoft Windows environment.

↧

Unknown symbol nvme_find_pdev_from_bdev

July 18, 2018, 11:17 am

≫ Next: Re: Firmware for MHJH29 ?

≪ Previous: The problem with RoCE connectivity between ConnectX-3 and ConnectX-4 Lx adapters

Hi all,

After installing MLNX_OFED_LINUX-4.4-1 on Ubuntu 18.04 (kernel 4.15.0-24) as "$ mlnxofedinstall --force --without-dkms --with-nvmf" I'm trying to use RDMA tools, but

- modprobe on nvme_rdma fails with "nvme_rdma: Unknown symbol nvme_delete_wq (err 0)"

- modprobe on nvmet_rdma fails with "nvmet: Unknown symbol nvme_find_pdev_from_bdev (err 0)"

What am I doing wrong, please?

I see two kernel modules loaded: nvme and nvme_core.

This is Mellanox MCX516A-CCAT ConnectX-5 EN Network Interface Card 100GbE Dual-Port QSFP28.

Any inputs will be greatly appreciated

Thank you

Dmitri Fedorov

Ciena Canada

↧

Re: Firmware for MHJH29 ?

July 19, 2018, 12:24 am

≫ Next: What are some good budget options for NICs and Switch?

≪ Previous: Unknown symbol nvme_find_pdev_from_bdev

> Service for this Hardware and FW has ended. So it will not be hosted on our site.

Thank you for your help. Seems this new firmware is no newer than the one I already have, unfortunately.

Those are 10+ years old cards... no surprise they are difficult to put back in service.

Cordially & thanks again for the great support,

↧

What are some good budget options for NICs and Switch?

July 19, 2018, 3:50 am

≫ Next: SN2100B v3.6.8004

≪ Previous: Re: Firmware for MHJH29 ?

I have a limited budget, and I want to buy a high speed network interface cards and a switch to connect multiple PCs into a local network (just a couple of meters away from one another) for HPC research.

Some of the key points for me are:

It should be compatible with Windows
Speed preferably 56 gigabit
Needs to be able to sustain a 100% load at all times
Non-managed Switch
No SFP(+) uplinks

Taking all of that into account what models should I look to buy? Perhaps some older models from a couple of years ago?

What are some caveats in building a system like that that I should consider?

What should I expect in terms of latency?

↧

SN2100B v3.6.8004

July 19, 2018, 7:42 pm

≫ Next: Re: MLNX+NVIDIA ASYNC GPUDirect - Segmentation fault: invalid permissions for mapped object running mpi with CUDA

≪ Previous: What are some good budget options for NICs and Switch?

Hi All,

I don't know if someone already posted something about this.

I just want to share that when we upgraded our SN2100 to v3.6.8004, the GUI doesn't load. Luckily it can still be accessible via SSH and I changed the next boot partition and it rebooted fine. I downgraded the partition from a working version and now we can manage the switch both via GUI and CLI.

Hope this helps someone who wants to upgrade to this version.

↧

Re: MLNX+NVIDIA ASYNC GPUDirect - Segmentation fault: invalid permissions for mapped object running mpi with CUDA

July 20, 2018, 4:08 am

≫ Next: Header Data Split

≪ Previous: SN2100B v3.6.8004

I have encountered this question, too.

It was because of the ucx do not compile with cuda.(The mlnx install the default ucx).

When I recompile the ucx with cuda and reinstall it ,It works.

↧

Header Data Split

July 20, 2018, 8:54 am

≫ Next: Re: Can't get full FDR bandwidth with Connect-IB card in PCI 2.0 x16 slot

≪ Previous: Re: MLNX+NVIDIA ASYNC GPUDirect - Segmentation fault: invalid permissions for mapped object running mpi with CUDA

I've made a feeble attempt to utilise Header Data Split (HDS) offload on Connect-X 5 adapters, by creating the striding WQ context with a non-zero log2_hds_buf_size value. However, the hardware won't have it and reports back bad_param error with syndrome 0x6aaebb.

According to an online error syndrome list, this translates to human readable as:

create_rq/rmp: log2_hds_buf_size not supported

Since the Public PRM does not describe HDS offload, I'm curious to whether certain preconditions need to met for this offload to work, or if this is a known restriction in current firmware? I'd also like to know if it's possible to configure the HDS "level", that is where the split happens (between L3/L4, L4/L5, ...).

The way I'd envision this feature to work is to zero-pad the end of headers up to log2_hds_buf_size, placing the upper layer payload at a fixed offset for any variable-size header length.

↧

Re: Can't get full FDR bandwidth with Connect-IB card in PCI 2.0 x16 slot

July 20, 2018, 2:11 pm

≫ Next: RoCE v2 configuration with Linux drivers and packages

≪ Previous: Header Data Split

Hello Eric -

I hope all is well...

You won't achieve a line rate of 56G/s because the NIC is: MCB193A-FCAT MT_1220110019 and your PCEi is 2.0

And the release notes for you FW state:

Connect-IB® Host Channel Adapter, single-port QSFP, FDR 56Gb/s,PCIe3.0 x16, tall bracket, RoHS R6

So getting ~45 ~48 Gb/s is good.

Have a great day!

Steve

↧

RoCE v2 configuration with Linux drivers and packages

July 20, 2018, 6:33 pm

≫ Next: send_bw test between QSFP ports on Dual Port Adapter

≪ Previous: Re: Can't get full FDR bandwidth with Connect-IB card in PCI 2.0 x16 slot

is it possible to configure RoCE v2 with Connectx-4 card without MLNX_OFED? can someone please share info if there is any guide/doc available to configure with Linux drivers and packages?

I tried to do with drivers and packages but I am not able to succeed. When I used MLNX_OFED, RoCE is configured successfully.

↧