Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6227 articles
Browse latest View live

100G ethernet failing to load in kernel 3.17 (cx415A)

$
0
0

Hi,

 

I am trying to up 100G Ethernet driver in RHEL with kernel 3.17. But it is not working

 

1.  I have installed mlnx-en-3.3-1.0.0.0.tgz, because of that all our IB drivers are unloaded

2. I have installed MLNX_OFED_LINUX-3.3-1.0.4.0-rhel7.0-x86_64.iso with that ib drivers up, but those ib drivers api giving conflict with other current driver API's

 

My requirement is:

I can use kernel from 3.17 to 4.0.0. with that kernel , I want to use 100G melanox card (cx415A)

 

1. Is 100 G will work with above kernels.

2. if it does not work, can you provide any patch to work with those kernels.

3. From which kernel 100G work properly with out any issues ?

 

Thanks

Rama


NFS over RoCE Ubuntu 16.04 with latest OFED

$
0
0

I'm having trouble with NFS over RoCE on Ubuntu 16.04 using the latest OFED (MLNX_OFED_LINUX-3.3-1.0.4.0-ubuntu16.04-x86_64.tgz)

 

Works with Inbox drivers (mostly) but not no much with latest OFED

I managed to get NFS working with RoCE by following the docs on this site using the Inbox drivers for Ubuntu 16.04. I was having some little issues and I know the Ubuntu stuff is out of date so I wanted to install the latest OFED/mlx4 drivers, etc... as per recommendations on this site. So I did that. All went as planned. IP functionality is all there and RDMA tools/tests all work. The newest mlx4 driver is confirmed loaded and everything seems to work great. Except one thing.

 

Now I have a problem. The svcrdma and xprtrdma modules won't load. Thus no RDMA support for NFS. I get the following errors. I have a feeling this can be resolved somehow - like by recompiling kernel modules and such but that is over my head at the moment. Or maybe I just messed something up (crossing fingers)? Can anyone help?

NFS server:

# modprobe svcrdma
modprobe: ERROR: could not insert 'rpcrdma': Invalid argument

dmesg errors:

[105699.696980] rpcrdma: Unknown symbol rdma_event_msg (err 0)
[105699.697056] rpcrdma: disagrees about version of symbol ib_create_cq
[105699.697059] rpcrdma: Unknown symbol ib_create_cq (err -22)
[105699.697069] rpcrdma: disagrees about version of symbol rdma_resolve_addr
[105699.697071] rpcrdma: Unknown symbol rdma_resolve_addr (err -22)
[105699.697183] rpcrdma: Unknown symbol ib_event_msg (err 0)
[105699.697213] rpcrdma: disagrees about version of symbol ib_dereg_mr
[105699.697215] rpcrdma: Unknown symbol ib_dereg_mr (err -22)
[105699.697224] rpcrdma: disagrees about version of symbol ib_query_qp
[105699.697226] rpcrdma: Unknown symbol ib_query_qp (err -22)
[105699.697236] rpcrdma: disagrees about version of symbol rdma_disconnect
[105699.697238] rpcrdma: Unknown symbol rdma_disconnect (err -22)
[105699.697245] rpcrdma: disagrees about version of symbol ib_alloc_fmr
[105699.697247] rpcrdma: Unknown symbol ib_alloc_fmr (err -22)
[105699.697294] rpcrdma: disagrees about version of symbol ib_dealloc_fmr
[105699.697295] rpcrdma: Unknown symbol ib_dealloc_fmr (err -22)
[105699.697301] rpcrdma: disagrees about version of symbol rdma_resolve_route
[105699.697303] rpcrdma: Unknown symbol rdma_resolve_route (err -22)
[105699.697398] rpcrdma: disagrees about version of symbol rdma_bind_addr
[105699.697400] rpcrdma: Unknown symbol rdma_bind_addr (err -22)
[105699.697441] rpcrdma: disagrees about version of symbol rdma_create_qp
[105699.697443] rpcrdma: Unknown symbol rdma_create_qp (err -22)
[105699.697479] rpcrdma: Unknown symbol ib_map_mr_sg (err 0)
[105699.697487] rpcrdma: disagrees about version of symbol ib_destroy_cq
[105699.697489] rpcrdma: Unknown symbol ib_destroy_cq (err -22)
[105699.697494] rpcrdma: disagrees about version of symbol rdma_create_id
[105699.697496] rpcrdma: Unknown symbol rdma_create_id (err -22)
[105699.697582] rpcrdma: disagrees about version of symbol rdma_listen
[105699.697584] rpcrdma: Unknown symbol rdma_listen (err -22)
[105699.697587] rpcrdma: disagrees about version of symbol rdma_destroy_qp
[105699.697589] rpcrdma: Unknown symbol rdma_destroy_qp (err -22)
[105699.697597] rpcrdma: disagrees about version of symbol ib_query_device
[105699.697599] rpcrdma: Unknown symbol ib_query_device (err -22)
[105699.697606] rpcrdma: disagrees about version of symbol ib_get_dma_mr
[105699.697607] rpcrdma: Unknown symbol ib_get_dma_mr (err -22)
[105699.697617] rpcrdma: disagrees about version of symbol ib_alloc_pd
[105699.697618] rpcrdma: Unknown symbol ib_alloc_pd (err -22)
[105699.697673] rpcrdma: Unknown symbol ib_alloc_mr (err 0)
[105699.697734] rpcrdma: disagrees about version of symbol rdma_connect
[105699.697736] rpcrdma: Unknown symbol rdma_connect (err -22)
[105699.697769] rpcrdma: Unknown symbol ib_wc_status_msg (err 0)
[105699.697842] rpcrdma: disagrees about version of symbol rdma_destroy_id
[105699.697844] rpcrdma: Unknown symbol rdma_destroy_id (err -22)
[105699.697872] rpcrdma: disagrees about version of symbol rdma_accept
[105699.697874] rpcrdma: Unknown symbol rdma_accept (err -22)
[105699.697882] rpcrdma: disagrees about version of symbol ib_destroy_qp
[105699.697883] rpcrdma: Unknown symbol ib_destroy_qp (err -22)
[105699.697964] rpcrdma: disagrees about version of symbol ib_dealloc_pd
[105699.697965] rpcrdma: Unknown symbol ib_dealloc_pd (err -22)

 

NFS client:

# modprobe xprtrdma         
modprobe: ERROR: could not insert 'rpcrdma': Invalid argument

dmesg errors:

[106055.692454] rpcrdma: Unknown symbol rdma_event_msg (err 0)
[106055.692480] rpcrdma: disagrees about version of symbol ib_create_cq
[106055.692481] rpcrdma: Unknown symbol ib_create_cq (err -22)
[106055.692484] rpcrdma: disagrees about version of symbol rdma_resolve_addr
[106055.692485] rpcrdma: Unknown symbol rdma_resolve_addr (err -22)
[106055.692520] rpcrdma: Unknown symbol ib_event_msg (err 0)
[106055.692529] rpcrdma: disagrees about version of symbol ib_dereg_mr
[106055.692530] rpcrdma: Unknown symbol ib_dereg_mr (err -22)
[106055.692532] rpcrdma: disagrees about version of symbol ib_query_qp
[106055.692533] rpcrdma: Unknown symbol ib_query_qp (err -22)
[106055.692536] rpcrdma: disagrees about version of symbol rdma_disconnect
[106055.692536] rpcrdma: Unknown symbol rdma_disconnect (err -22)
[106055.692538] rpcrdma: disagrees about version of symbol ib_alloc_fmr
[106055.692539] rpcrdma: Unknown symbol ib_alloc_fmr (err -22)
[106055.692552] rpcrdma: disagrees about version of symbol ib_dealloc_fmr
[106055.692553] rpcrdma: Unknown symbol ib_dealloc_fmr (err -22)
[106055.692554] rpcrdma: disagrees about version of symbol rdma_resolve_route
[106055.692555] rpcrdma: Unknown symbol rdma_resolve_route (err -22)
[106055.692565] rpcrdma: disagrees about version of symbol rdma_bind_addr
[106055.692565] rpcrdma: Unknown symbol rdma_bind_addr (err -22)
[106055.692573] rpcrdma: disagrees about version of symbol rdma_create_qp
[106055.692574] rpcrdma: Unknown symbol rdma_create_qp (err -22)
[106055.692583] rpcrdma: Unknown symbol ib_map_mr_sg (err 0)
[106055.692585] rpcrdma: disagrees about version of symbol ib_destroy_cq
[106055.692585] rpcrdma: Unknown symbol ib_destroy_cq (err -22)
[106055.692587] rpcrdma: disagrees about version of symbol rdma_create_id
[106055.692587] rpcrdma: Unknown symbol rdma_create_id (err -22)
[106055.692613] rpcrdma: disagrees about version of symbol rdma_listen
[106055.692614] rpcrdma: Unknown symbol rdma_listen (err -22)
[106055.692615] rpcrdma: disagrees about version of symbol rdma_destroy_qp
[106055.692615] rpcrdma: Unknown symbol rdma_destroy_qp (err -22)
[106055.692617] rpcrdma: disagrees about version of symbol ib_query_device
[106055.692618] rpcrdma: Unknown symbol ib_query_device (err -22)
[106055.692619] rpcrdma: disagrees about version of symbol ib_get_dma_mr
[106055.692620] rpcrdma: Unknown symbol ib_get_dma_mr (err -22)
[106055.692622] rpcrdma: disagrees about version of symbol ib_alloc_pd
[106055.692623] rpcrdma: Unknown symbol ib_alloc_pd (err -22)
[106055.692638] rpcrdma: Unknown symbol ib_alloc_mr (err 0)
[106055.692657] rpcrdma: disagrees about version of symbol rdma_connect
[106055.692658] rpcrdma: Unknown symbol rdma_connect (err -22)
[106055.692668] rpcrdma: Unknown symbol ib_wc_status_msg (err 0)
[106055.692690] rpcrdma: disagrees about version of symbol rdma_destroy_id
[106055.692690] rpcrdma: Unknown symbol rdma_destroy_id (err -22)
[106055.692698] rpcrdma: disagrees about version of symbol rdma_accept
[106055.692699] rpcrdma: Unknown symbol rdma_accept (err -22)
[106055.692701] rpcrdma: disagrees about version of symbol ib_destroy_qp
[106055.692701] rpcrdma: Unknown symbol ib_destroy_qp (err -22)
[106055.692724] rpcrdma: disagrees about version of symbol ib_dealloc_pd
[106055.692725] rpcrdma: Unknown symbol ib_dealloc_pd (err -22)

Change mlx5 to Ethernet mode without mlxconfig

$
0
0

I'm doing some testing of various network technologies (Omini_path, iWARP, RoCE, ConnectX-3, ConnectX-4 and hopefully soon ConnectX-5) to find which fits our workload best. I'm writing a test suite that will set-up and configure all the aspects of the testing and as I went to program switching the ConnectX-3 cards from Infiniband to Ethernet to test RoCE, I noticed that it is recommend to us connectx_port_config, but you could also run `echo eth > /sys/class/infiniband/mlx4_0/mlx4_0/device/device/mlx4_port1` and get the same effect and it changes the mode of the card right away without needing a reboot.

 

For the Connect-IB cards we have using the mlx5 drivers, that path does not exist and all I can find says to use mlxconfig and do what looks like writing to the firmware of the card then rebooting the machine. It also seems that the Ethernet code has been moved into the mlx5_core module so there isn't an mlx5_en module anymore. It seems that there should be some way to either write to sys like the mlx4 driver, or at least unload and reload the modules with some parameters to change the way the card works. I'm trying to use only in-box drivers and software on CentOS 7.2 otherwise it really complicates my test software being able to run on any freshly provisioned machine. The tests are between two machines without a switch so changing the modes isn't a big deal for these tests. I've tried scanning through sys looking for something that resembles some sort of switch like the mlx4 driver, but I can't find it.

 

Any ideas to switch the mlx5 mode without mlxconfig?

 

Thanks,

Robert LeBlanc

Re: FlexBoot linux IBoIP

$
0
0

IPoIB installation fails as RedHat/CentOS doesn't support it. Even if the IPoIB modules are integrated into the RAMFS, installation will not recognize ipoib device as suitable for installation

Re: Does ConnectX-3 support header split?

$
0
0

Hello Jeff,

 

Their isn't a simple answer to your question since it requires Development involvement, i suggest you open case internally at support@mellanox.com.

 

Thank you.

Cheers,

.R

Re: Mellanox ConnectX2 quad Rate IB Mezz card stopped functioning on Dell M710 blade

$
0
0

If this helps someone..

I did not receive any reply from anyone, so I went ahead and replaced the faulty card with a similar one and it worked without further reconfiguration.

Re: Mellanox eSwitchd issue on Openstack Kilo?

$
0
0

Hi all,

I'm still working on this problem.

Does anyone have the same problem or solution of this?

Any suggestions are welcomed. I'm totally lost.

Re: Mellanox eSwitchd issue on Openstack Kilo?

$
0
0

Hi Muneyoshi,

 

My apologies for the late reply.

 

If the problem is that ib0 is not up it usually means that the port is not connected to switch or another ib port

If executing  “ifconfig ib0 up”  will not make it up it means you have a problem with the cable connection.

 

Can you check if the cable is properly seated?

 

Thanks and regards,

~Martijn


ib_write_bw error connecting to server with rdma_cm option

$
0
0

I am trying to run a ib_write_bw test with --rdma_cm(-R) option, the 2 hosts are connected to back to back with infiniband.

I get an error connecting to server. One host has Waiting for client to connect, the other host throws an error:

host1:

>ib_write_bw -R 

************************************
* Waiting for client to connect... *
************************************

host2:

>ib_write_bw -R host1  
Received 10 times ADDR_ERROR
Unable to perform rdma_client function
Unable to init the socket connection

The test runs fine without the -R option, any idea why it fails connection with rdma_cm ?

Windows Server 2016 Expected Speeds?

$
0
0

We have a mellanox connectx-3 NIC  connected to a Mellanox 1012 12x40GE switch operating with Windows Server 2016.  We've used the beta software driver provided by mellanox but can't seem to get anywhere close to wire speed.  Has anyone else experienced low performance levels?  I used iPerf3 to test the speed on two different nodes in a cluster, and could use another tool to confirm the speeds if anyone has any recommendations.

 

 

OpenMPI with MXM 32bit issue

$
0
0

We have the problem that after 2.147 billion messages, which is the range of an int32_t, we cannot receive any more messages.

 

Compiling OpenMPI with the flag "--with-mxm=/path/to/mxm" causes this problem while without this flag everything is fine. The Problem is reproducible with the attached example code, by compiling and running it with the follwing commands:

$ /path/to/openmpi/bin/mpic++ openmpi_mxm_freeze.cxx -o openmpi_mxm_freeze

$ /path/to/openmpi/bin/mpirun -np 2 openmpi_mxm_freeze

 

Maybe the issue is connected with the following lines from "mxm_def.h":

typedef uint32_t             mxm_tag_t;/* MXM tag type */
typedef uint32_t             mxm_imm_t;/* MXM immediate data type */

 

The problem occurs with the newest Mellanox firmware, OFED package and OpenMPI version.

Re: ib_write_bw error connecting to server with rdma_cm option

$
0
0

HiRak K

please try to use the interface IP Address (mellanox interface ip) instead of the host name in the client side.

 

Thanks,

Talat

krping (4.7 kernel version) crashing with mlx5_core (CX415A), it is passing with mlx4_core(CX354A)

$
0
0

Hi,

I am trying to run krping code which I got from  the below link on the Linux kernel 4.7

GitHub - larrystevenwise/krping: Kernel Mode RDMA Ping

 

with 4.7 kernel I can able to use mlx4 card (). But it is failing with mlx5 100G (cx415A). It is giving crash from mlx5_core. Please find below logs. Please let me know if you need any further information.

 

KRPING Client Log:

 

[  701.240641] krping_init

[  713.858099] krping: proc write
|client,addr=192.168.69.127,port=9999,count=100|

[  713.858106] client

[  713.858109] ipaddr (192.168.69.127)

[  713.858112] port 9999

[  713.858114] count 100

[  713.858122] created cm_id ffff880242e9b000

[  713.858170] cma_event type 0 cma_id ffff880242e9b000
(parent)

[  713.858271] cma_event type 2 cma_id ffff880242e9b000
(parent)

[  713.858282] rdma_resolve_addr - rdma_resolve_route
successful

[  713.858464] created pd ffff880095581cc0

[  713.858812] created cq ffff880252c53a00

[  713.859223] created qp ffff8802530a3800

[  713.859225] krping: krping_setup_buffers called on
cb ffff880242e99c00

[  713.859316] krping: allocated & registered
buffers...

[  713.861548] cma_event type 9 cma_id ffff880242e9b000
(parent)

[  713.861555] ESTABLISHED

[  713.861569] rdma_connect successful

[  713.861575] RDMA addr 24614b640 rkey 1563 len 64

[  713.861963] mlx5_0:dump_cqe:263:(pid 0): dump error
cqe

[  713.861967] 00000000 00000000 00000000 00000000

[  713.861968] 00000000 00000000 00000000 00000000

[  713.861970] 00000000 00000000 00000000 00000000

[  713.861972] 00000000 92005204 0a0000b6 000055d2

[  713.861975] krping: cq completion failed with wr_id
0 status 4 opcode -1 vender_err 52

[  713.861987] krping: wait for RDMA_WRITE_ADV state 10

[  713.862082] krping_free_buffers called on cb
ffff880242e99c00

[  713.863131] destroy cm_id ffff880242e9b000

 

KR PING SERVER LOG:

 

[  816.542447] krping_init

[  846.465708] krping: proc write
|server,addr=192.168.69.127,port=9999|

[  846.465717] server

[  846.465721] ipaddr (192.168.69.127)

[  846.465725] port 9999

[  846.465738] created cm_id ffff8804112ffc00

[  846.465750] rdma_bind_addr successful

[  846.465752] rdma_listen

[  852.111109] cma_event type 4 cma_id ffff880429de4800
(child)

[  852.111116] child cma ffff880429de4800

[  852.111379] created pd ffff8804112ed440

[  852.111594] created cq ffff88041470c800

[  852.112100] created qp ffff880429205800

[  852.112106] krping: krping_setup_buffers called on
cb ffff8804112fe800

[  852.112216] krping: allocated & registered
buffers...

[  852.112218] accepting client connection request

[  852.113286] cma_event type 9 cma_id ffff880429de4800
(child)

[  852.113290] ESTABLISHED

[  852.113813] cma_event type 10 cma_id
ffff880429de4800 (child)

[  852.113818] krping: DISCONNECT EVENT...

[  852.113894] krping: wait for CONNECTED state 10

[  852.113899] krping: connect error -1

[  852.113903] krping_free_buffers called on cb
ffff8804112fe800

[  852.115259] destroy cm_id ffff8804112ffc00

[root@xhdipsnvme1 krping-master]# cat /proc/krping

Re: ib_write_bw error connecting to server with rdma_cm option

$
0
0

talatb I get the same error with ip address, any other setup required?

Re: 40 GbE External Metallic Loopback Plug?

$
0
0

Hi David,

 

We don't manufacture 40GbE loopback cables and we don't have any recommendation for any of these cables.

 

Thnaks

Khwaja


Re: ib_write_bw error connecting to server with rdma_cm option

$
0
0

Hi Rak,

Is the rdma_ucm and rdma_cm modules loaded ?

could you please send the setup information ?

Driver, Firmware, Device and Operation system.

Re: CQE completed in error - vendor syndrom:216 syndrom:2

$
0
0

The issue might be related to by PCI bus, but it is difficult to say without deeper analyzing of the whole source code including OS. I may recommend to put more prints in FreeBSD and your code, and following them to see if you getting the same flow on both systems. Hopefully, you will see some differences in the logic.

On the other hand, can you check if using the latest firmware helps?

Re: Does MCX3141 only support ubuntu12.04? whether support 14.04 &15.04&15.10 and other new version? more questions: Does MCX3141 support SR-IOV for ROCEv2 in which ubuntu version?

Re: Does MCX3141 only support ubuntu12.04? whether support 14.04 &15.04&15.10 and other new version? more questions: Does MCX3141 support SR-IOV for ROCEv2 in which ubuntu version?

Re: 100G ethernet failing to load in kernel 3.17 (cx415A)

$
0
0

In order to get working configuration, check the MOFED release notes for supported OS/kernel matrix available from Mellanox site

http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_Release_Notes_3_3-1_0_4_0.pdf

If the kernel you are using, provided by OS vendor, then it supported and should work. In the case if system has mixed configuration, like OS itself is based RedHat/SLES/Debian but used kernel is from www.kernel.org, this combination might not work.

For more details, check release notes.

Viewing all 6227 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>