Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6227 articles
Browse latest View live

Re: mlnxofedinstall of 4.3-3.0.2.1-rhel7.5alternate-aarch64 has some checking bug need to be fixed

$
0
0

Hi,

 

 

I have here a machine with Centos 7.5 on ARM and cannot see the same output.

I would like to investigate it even if you already a workaround.

For this purpose, I need you to open a case at support@mellanox.com

 

Thanks in advance

Marc


Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 On Ubuntu 16.04

$
0
0

Hi, have a Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0  installed server with Ubuntu 16.04 and were able to install the drivers etc.,

 

But It card doesn't show up using ifconfig -a 

 

Any ideas? Is this version of OS and Kernel supported for ConnectX VPI PCIe 2.0?

 

Here is more info:

root@ubuntu16-sdc:~#  root@ubuntu16-sdc:~# uname -a

Linux ubuntu16-sdc 4.8.0-44-generic #47~16.04.1-Ubuntu SMP Wed Mar 22 18:51:56 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

 

root@ubuntu16-sdc:~# lspci | grep Mell

03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)

 

root@ubuntu16-sdc:~# /etc/init.d/openibd restart

Unloading HCA driver:                                      [  OK  ]

Loading HCA driver and Access Layer:                       [  OK  ]

 

root@ubuntu16-sdc:~# hca_self_test.ofed

 

---- Performing Adapter Device Self Test ----

Number of CAs Detected ................. 1

PCI Device Check ....................... PASS

Kernel Arch ............................ x86_64

Host Driver Version .................... MLNX_OFED_LINUX-4.4-1.0.0.0 (OFED-4.4-1.0.0): 4.8.0-44-generic

Host Driver RPM Check .................. PASS

Firmware on CA #0 HCA .................. v2.10.0720

Host Driver Initialization ............. PASS

Number of CA Ports Active .............. 0

Kernel Syslog Check .................... PASS

Node GUID on CA #0 (HCA) ............... NA

------------------ DONE ---------------------

 

root@ubuntu16-sdc:~#  mlxfwmanager --online -u -d 0000:03:00.0

Querying Mellanox devices firmware ...

Device #1:

----------

  Device Type:      ConnectX2

  Part Number:      MHQH19B-XTR_A1-A3

  Description:      ConnectX-2 VPI adapter card; single-port 40Gb/s QSFP; PCIe2.0 x8 5.0GT/s; tall bracket; RoHS R6

  PSID:             MT_0D90110009

  PCI Device Name:  0000:03:00.0

  Port1 MAC:        0002c94f2ec0

  Port2 MAC:        0002c94f2ec1

  Versions:         Current        Available

     FW             2.10.0720      N/A

 

  Status:           No matching image found

 

 

root@ubuntu16-sdc:~# lsmod | grep ib

ib_ucm                 20480  0

ib_ipoib              172032  0

ib_cm                  53248  3 rdma_cm,ib_ipoib,ib_ucm

ib_uverbs             106496  2 ib_ucm,rdma_ucm

ib_umad                24576  0

mlx5_ib               270336  0

mlx5_core             806912  2 mlx5_fpga_tools,mlx5_ib

mlx4_ib               212992  0

ib_core               286720  10 ib_cm,rdma_cm,ib_umad,ib_uverbs,ib_ipoib,iw_cm,mlx5_ib,ib_ucm,rdma_ucm,mlx4_ib

mlx4_core             348160  2 mlx4_en,mlx4_ib

mlx_compat             20480  15 ib_cm,rdma_cm,ib_umad,ib_core,mlx5_fpga_tools,ib_uverbs,mlx4_en,ib_ipoib,mlx5_core,iw_cm,mlx5_ib,mlx4_core,ib_ucm,rdma_ucm,mlx4_ib

devlink                28672  4 mlx4_en,mlx5_core,mlx4_core,mlx4_ib

libfc                 114688  1 tcm_fc

libcomposite           65536  2 usb_f_tcm,tcm_usb_gadget

udc_core               53248  2 usb_f_tcm,libcomposite

scsi_transport_fc      61440  3 qla2xxx,tcm_qla2xxx,libfc

target_core_iblock     20480  0

target_core_mod       356352  9 iscsi_target_mod,usb_f_tcm,vhost_scsi,target_core_iblock,tcm_loop,tcm_qla2xxx,target_core_file,target_core_pscsi,tcm_fc

configfs               40960  6 rdma_cm,iscsi_target_mod,usb_f_tcm,target_core_mod,libcomposite

libiscsi_tcp           24576  1 iscsi_tcp

libiscsi               53248  2 libiscsi_tcp,iscsi_tcp

scsi_transport_iscsi    98304  3 libiscsi,iscsi_tcp

 

Please let me know if any other info needed..

Small redundant MLAG setup

$
0
0

Hi there,

 

first of all thanks for all the great information that can be found here.

 

I'm trying to build a fully redundant setup for two racks in different locations with the smallest amount of switches. From my understanding MLAG will help me to keep redundancy towards the servers. That means for each rack:

  • use two switches to create a mlag domain
  • attach all servers to both switches.

So I need at least 4 switches. Now to my problem. I want to avoid another two spine switches. For my setup (two racks) this seems be a little overpowered and the spines would only use 4-6 ports each. The question mark in the picture shows where the magic must happen.

mlag.png

Looking at other posts I see the following option.

  • With MLNX-OS 3.6.6102 STP and MLAG can coexist.
  • I implement a fully redundant interconnect of all 4 switches. 
  • I activate MSTP (as I have multiple VLANs)
  • MSTP will allow to utilize interconnection as good as possible

Is this ok or am I missing something?

 

Thanks in advance.

DPDK with MLX4 VF on Hyper-v VM

$
0
0

Hello,

 

(Not sure if this is the right place to ask, if not , please kindly point out the right place or person)

 

I have a connect-3 CX354a installed on my Window Server 2016, and enabled SR-IOV on the card and the server, following the WinOF user guide.

 

On the ubuntu 18.02 VM created using MS Hyperv-v, I can see the mlnx VF working properly. But when I tried to run testpmd on the VM using the following command:

./testpmd -l 0-1 -n 4 --vdev=net_vdev_netvsc0,iface=eth1,force=1 -w 0002:00:02.0  --vdev=net_vdev_netvsc0,iface=eth2,force=1 -w 0003:00:02.0 -- --rxq=2 --txq=2 -i

 

I ran into an error:

 

PMD: mlx4.c:138: mlx4_dev_start(): 0x562bff05e040: cannot attach flow rules (code 12, "Cannot allocate memory"), flow error type 2, cause 0x7f39ef408780, message: flow rule rejected by device

 

The command format was suggested by MS Azure to run DPDK on their AN-enabled VMs. And it does work on Azure VMs.

 

Here is ibv_devinfo output from my vm:

 

root@myVM:~/MLNX_OFED_SRC-4.4-1.0.0.0# ibv_devinfo

hca_id:    mlx4_0

    transport:            InfiniBand (0)

    fw_ver:                2.42.5000

    node_guid:            0014:0500:691f:c3fa

    sys_image_guid:            ec0d:9a03:001c:92e3

    vendor_id:            0x02c9

    vendor_part_id:            4100

    hw_ver:                0x0

    board_id:            MT_1090120019

    phys_port_cnt:            1

        port:    1

            state:            PORT_DOWN (1)

            max_mtu:        4096 (5)

            active_mtu:        1024 (3)

            sm_lid:            0

            port_lid:        0

            port_lmc:        0x00

            link_layer:        Ethernet

 

hca_id:    mlx4_1

    transport:            InfiniBand (0)

    fw_ver:                2.42.5000

    node_guid:            0014:0500:76e0:b9d1

    sys_image_guid:            ec0d:9a03:001c:92e3

    vendor_id:            0x02c9

    vendor_part_id:            4100

    hw_ver:                0x0

    board_id:            MT_1090120019

    phys_port_cnt:            1

        port:    1

            state:            PORT_DOWN (1)

            max_mtu:        4096 (5)

            active_mtu:        1024 (3)

            sm_lid:            0

            port_lid:        0

            port_lmc:        0x00

            link_layer:        Ethernet

 

and here is the kernel module info of mlx4_en:

 

filename:   /lib/modules/4.15.0-23-generic/updates/dkms/mlx4_en.ko
version:    4.4-1.0.0
license:    Dual BSD/GPL
description:Mellanox ConnectX HCA Ethernet driver
author:     Liran Liss, Yevgeny Petrilin
srcversion: 23E8E7A25194AE68387DC95
depends:    mlx4_core,mlx_compat,ptp,devlink
retpoline:  Y
name:       mlx4_en
vermagic:   4.15.0-23-generic SMP mod_unload
parm:       udev_dev_port_dev_id:Work with dev_id or dev_port when supported by the kernel. Range: 0 <= udev_dev_port_dev_id <= 2 (default = 0).
   0: Work with dev_port if supported by the kernel, otherwise work with dev_id.
   1: Work only with dev_id regardless of dev_port support.
   2: Work with both of dev_id and dev_port (if dev_port is supported by the kernel). (int)
parm:       udp_rss:Enable RSS for incoming UDP traffic or disabled (0) (uint)
parm:       pfctx:Priority based Flow Control policy on TX[7:0]. Per priority bit mask (uint)
parm:       pfcrx:Priority based Flow Control policy on RX[7:0]. Per priority bit mask (uint)
parm:       inline_thold:Threshold for using inline data (range: 17-104, default: 104) (uint)

 

One difference I notice between my local VM and Azure VM is that the mlx4_en.ko module is definitely different. Azure seems to be using a specialized version of mlx4_en. Is this the reason why the testpmd works on azure but not my local Hyper-v VM?

 

If so, how can I get a DPDK-capable driver for MS Hyper-v?

 

Thank you!

Re: Windows RDMA QoS and WinOF-2 1.80 issues

$
0
0

Hi,

vlan tagged traffic is not required - but I do need it in a vlan

Sorry for late reply but it took me longer then expected to get back to this.

 

Anyway I think I solved this. Apparently after earlier learning experience I ended up with mismatch DCBX config between card and Windows.  I had it enabled in firmware:

LLDP_NB_DCBX_P1

LLDP_NB_RX_MODE_P1

LLDP_NB_TX_MODE_P1

and disabled in windows.

After resetting card firmware settings to default (DCBX not listening to switch) it's ok now even with 1.80 driver.

For resetting my card: mlxconfig.exe -d mt4115_pciconf0 reset

Thanks for suggestions.

Yocto embedded build of rdma-core

$
0
0

Greetings,

 

I'm working with an embedded build of the rdma-core code (and rdma-perftest but I'm not that far yet).  We're doing a cross build of the rdma-core code using yocto targeted at an Altera Arria10 board which includes a dual-core ARM Cortex-A9 processor.  I've been able to successfully build the kernel 4.15 with the rxe modules.  However, when I build the userland and get to the rdma-core library I run into a number of issues.  One has been particularly vexing.   During the do_configure() stage of the yocto build for rdma-core, I get errors regarding the installation of the rdma_man_pages.  In particular, in the buildlib/rdma_man.cmake file, there is a routine: function(rdma_man_pages) that fails.  If I comment out the entire body of this function, I can get the binaries to build but that takes me to another problem during the do_install portion.  It appears that pandoc is not available at this stage.  I suppose I can try to add pandoc with a separate recipe and then try again.  Anyone have any comments on this build issue?

 

Thanks,

FM

Question about RC (read) and UD (send/recv) performance over CX3 and CX4 NIC

$
0
0

I am using CX3 and CX4 NICs to measure the throughput of RDMA verbs (RC Read and UD Send/Recv).

 

When I use the same test code to measure the peak throughput of small messages on CX3 and CX4.

The performance of RDMA Read Verbs is lower than Send/Recv Verbs on CX3, while the comparison result is reversed on CX4.

 

How about the performance trend of new generation of NICs like CX5 or CX6?

"Priority trust-mode is not supported on your system"?

$
0
0

Hello, I met a problem when I set the trust-mode for the ConnectX3-Pro 40GbE NIC.

The system information follows:

LSB Version: :core-4.1-amd64:core-4.1-noarch

Distributor ID: CentOS

Description: CentOS Linux release 7.3.1611 (Core)

Release: 7.3.1611

Codename: Core

The ConnectX3-Pro NIC information follows:

hca_id: mlx4_1

transport: InfiniBand (0)

fw_ver: 2.40.7000

node_guid: f452:1403:0095:2280

sys_image_guid: f452:1403:0095:2280

vendor_id: 0x02c9

vendor_part_id: 4103

hw_ver: 0x0

board_id: MT_1090111023

phys_port_cnt: 2

Device ports:

port: 1

state: PORT_ACTIVE (4)

max_mtu: 4096 (5)

active_mtu: 1024 (3)

sm_lid: 0

port_lid: 0

port_lmc: 0x00

link_layer: Ethernet

port: 2

state: PORT_DOWN (1)

max_mtu: 4096 (5)

active_mtu: 1024 (3)

sm_lid: 0

port_lid: 0

port_lmc: 0x00

link_layer: Ethernet

It's the first time that I have met this problem. So I don't know what to do.

What does this tip mean? Is the system version the main cause?

Waiting for your help.

Thanks.


Re: Can't get full FDR bandwidth with Connect-IB card in PCI 2.0 x16 slot

$
0
0

Server:
> ib_send_bw -a -F --report_gbits
Client:
> ib_send_bw -a -F --report_gbits <serverIP>

 

Please let me know your results and thank you...

~Steve

Re: Can't get full FDR bandwidth with Connect-IB card in PCI 2.0 x16 slot

$
0
0

Hi Steve,

 

Here you are.  Thanks for taking an interest.  Still getting ~45 Gb/sec on both client and server.  Here is the client output:

 

[root@vx01 ~]#  ib_send_bw -a -F --report_gbits vx02

---------------------------------------------------------------------------------------

                    Send BW Test

Dual-port       : OFF Device         : mlx5_0

Number of qps   : 1 Transport type : IB

Connection type : RC Using SRQ      : OFF

TX depth        : 128

CQ Moderation   : 100

Mtu             : 4096[B]

Link type       : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

local address: LID 0x3e4 QPN 0x005e PSN 0x239861

remote address: LID 0x3e6 QPN 0x004c PSN 0xed4513

---------------------------------------------------------------------------------------

#bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]

2          1000           0.098220            0.091887            5.742968

4          1000             0.20               0.19       6.037776

8          1000             0.40               0.39       6.071169

16         1000             0.78               0.67       5.220818

32         1000             1.53               1.43       5.576730

64         1000             3.16               3.10       6.053410

128        1000             6.20               6.16       6.012284

256        1000             12.35              12.28     5.997002

512        1000             22.67              22.47     5.486812

1024       1000             38.02              36.69     4.478158

2048       1000             42.26              42.04     2.565771

4096       1000             43.82              43.68     1.332978

8192       1000             44.63              44.63     0.681005

16384      1000             44.79              44.79     0.341728

32768      1000             45.21              45.21     0.172449

65536      1000             45.35              45.35     0.086506

131072     1000             45.45              45.45     0.043342

262144     1000             45.45              45.45     0.021670

524288     1000             45.47              45.47     0.010840

1048576    1000             45.47              45.47     0.005421

2097152    1000             45.48              45.48     0.002711

4194304    1000             45.48              45.48     0.001355

8388608    1000             45.48              45.48     0.000678

---------------------------------------------------------------------------------------

 

Here is the server output:

 

[root@vx02 ~]#  ib_send_bw -a -F --report_gbits

 

 

************************************

* Waiting for client to connect... *

************************************

---------------------------------------------------------------------------------------

                    Send BW Test

Dual-port       : OFF Device         : mlx5_0

Number of qps   : 1 Transport type : IB

Connection type : RC Using SRQ      : OFF

RX depth        : 512

CQ Moderation   : 100

Mtu             : 4096[B]

Link type       : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

local address: LID 0x3e6 QPN 0x004c PSN 0xed4513

remote address: LID 0x3e4 QPN 0x005e PSN 0x239861

---------------------------------------------------------------------------------------

#bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]

2          1000           0.000000            0.099141            6.196311

4          1000             0.00               0.20       6.229974

8          1000             0.00               0.40       6.265230

16         1000             0.00               0.69       5.362016

32         1000             0.00               1.47       5.727960

64         1000             0.00               3.22       6.283794

128        1000             0.00               6.34       6.191118

256        1000             0.00               12.64     6.169975

512        1000             0.00               23.08     5.634221

1024       1000             0.00               37.53     4.581582

2048       1000             0.00               42.63     2.602155

4096       1000             0.00               44.07     1.344970

8192       1000             0.00               45.04     0.687191

16384      1000             0.00               45.04     0.343602

32768      1000             0.00               45.35     0.172994

65536      1000             0.00               45.45     0.086690

131072     1000             0.00               45.52     0.043409

262144     1000             0.00               45.51     0.021699

524288     1000             0.00               45.52     0.010852

1048576    1000             0.00               45.52     0.005427

2097152    1000             0.00               45.53     0.002714

4194304    1000             0.00               45.53     0.001357

8388608    1000             0.00               45.53     0.000678

---------------------------------------------------------------------------------------

 

Please let me know if you want any other info and I will send it straight away.

 

Regards,

 

Eric

ASAP2 Live Migration & H/W LAG

$
0
0

Hi,

 

Is ASAP2 OVS Offload support OpenStack Live Migration? If not, which ASAP2 mode should I use, OVS Acceleration or Application Acceleration (DPDK Offload)? How to have H/W LAG (with LACP) on each of those three modes?

 

Best regards,

The problem with RoCE connectivity between ConnectX-3 and ConnectX-4 Lx adapters

$
0
0

Hello.

I have Microsoft Windows 2012 R2 cluster and some nodes have ConnectX-3 adapters and some nodes have ConnectX-4 Lx adapters.

There is RoCE connectivity between nodes with ConnectX-4 Lx adapters, but there isn’t connectivity between nodes with different adapters.

I think it’s because ConnectX-3 adapters use RoCE  1.0 mode, but ConnectX-4 Lx adapters use RoCE 2.0 mode.

I tred to change RoCE mode from 1.0 to 2.0 for ConnectX-3 adapters by “Set-MlnxDriverCoreSetting -RoceMode 2”, but had warning – “SingleFunc_2_0_0: RoCE v2.0 mode was requested, but it is not supported. The NIC starts in RoCE v1.5 mode” and RoCE connectivity doesn’t work.

What the best way to fix my problem ?

ConnectX-3 adapters don’t work in RoCE 2.0 mode at all ?    How about new FW ? Today I have WinOF-5_35 with FW:

 

Image type:      FS2

FW Version:      2.40.5032

FW Release Date: 16.1.2017

Product Version: 02.40.50.32

Rom Info:        type=PXE version=3.4.747 devid=4099

Device ID:       4099

Description:     Node             Port1            Port2            Sys image

GUIDs:           ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff

MACs: e41d2ddfa540     e41d2ddfa541

VSD:

PSID:            MT_1080120023

 

 

I don’t find way to change to RoCE 1.0 mode for ConnectX-4 Lx adapters for Microsoft Windows environment.

Unknown symbol nvme_find_pdev_from_bdev

$
0
0

Hi all,

 

After installing MLNX_OFED_LINUX-4.4-1 on Ubuntu 18.04 (kernel 4.15.0-24) as "$ mlnxofedinstall --force --without-dkms --with-nvmf" I'm trying to use RDMA tools, but

 

- modprobe on nvme_rdma fails with "nvme_rdma: Unknown symbol nvme_delete_wq (err 0)"

- modprobe on nvmet_rdma fails with "nvmet: Unknown symbol nvme_find_pdev_from_bdev (err 0)"

 

What am I doing wrong, please?

 

I see two kernel modules loaded: nvme and nvme_core.

 

This is Mellanox MCX516A-CCAT ConnectX-5 EN Network Interface Card 100GbE Dual-Port QSFP28.

 

Any inputs will be greatly appreciated

 

Thank you

Dmitri Fedorov

Ciena Canada

Re: Firmware for MHJH29 ?

$
0
0

> Service for this Hardware and FW has ended. So it will not be hosted on our site.

 

Thank you for your help. Seems this new firmware is no newer than the one I already have, unfortunately.

Those are 10+ years old cards... no surprise they are difficult to put back in service.

 

Cordially & thanks again for the great support,

What are some good budget options for NICs and Switch?

$
0
0

I have a limited budget, and I want to buy a high speed network interface cards and a switch to connect multiple PCs into a local network (just a couple of meters away from one another) for HPC research.

Some of the key points for me are:

  • It should be compatible with Windows
  • Speed preferably 56 gigabit
  • Needs to be able to sustain a 100% load at all times
  • Non-managed Switch
  • No SFP(+) uplinks

 

Taking all of that into account what models should I look to buy? Perhaps some older models from a couple of years ago?

What are some caveats in building a system like that that I should consider?

What should I expect in terms of latency?


SN2100B v3.6.8004

$
0
0

Hi All,

 

I don't know if someone already posted something about this.

I just want to share that when we upgraded our SN2100 to v3.6.8004, the GUI doesn't load. Luckily it can still be accessible via SSH and I changed the next boot partition and it rebooted fine. I downgraded the partition from a working version and now we can manage the switch both via GUI and CLI.

 

Hope this helps someone who wants to upgrade to this version.

Re: MLNX+NVIDIA ASYNC GPUDirect - Segmentation fault: invalid permissions for mapped object running mpi with CUDA

$
0
0

I have encountered this question, too.

It was because of the ucx do not compile with cuda.(The mlnx install the default ucx).

When I recompile the ucx with cuda and reinstall it ,It works.

Header Data Split

$
0
0

I've made a feeble attempt to utilise Header Data Split (HDS) offload on Connect-X 5 adapters, by creating the striding WQ context with a non-zero log2_hds_buf_size value. However, the hardware won't have it and reports back bad_param error with syndrome 0x6aaebb.

 

According to an online error syndrome list, this translates to human readable as:

create_rq/rmp: log2_hds_buf_size not supported

 

Since the Public PRM does not describe HDS offload, I'm curious to whether certain preconditions need to met for this offload to work, or if this is a known restriction in current firmware? I'd also like to know if it's possible to configure the HDS "level", that is where the split happens (between L3/L4, L4/L5, ...).

 

The way I'd envision this feature to work is to zero-pad the end of headers up to log2_hds_buf_size, placing the upper layer payload at a fixed offset for any variable-size header length.

Re: Can't get full FDR bandwidth with Connect-IB card in PCI 2.0 x16 slot

$
0
0

Hello Eric -

   I hope all is well...

You won't achieve a line rate of 56G/s because the NIC is: MCB193A-FCAT MT_1220110019 and your PCEi is 2.0

And the release notes for you FW state:

Connect-IB® Host Channel Adapter, single-port QSFP, FDR 56Gb/s,PCIe3.0 x16, tall bracket, RoHS R6

 

So getting ~45 ~48 Gb/s is good.

 

Have a great day!

Steve

RoCE v2 configuration with Linux drivers and packages

$
0
0

is it possible to configure RoCE v2 with Connectx-4 card without MLNX_OFED? can someone please share info if there is any guide/doc available to configure with Linux drivers and packages?

I tried to do with drivers and packages but I am not able to succeed. When I used MLNX_OFED, RoCE is configured successfully.

Viewing all 6227 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>