Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6227 articles
Browse latest View live

Re: sr-iov and vxlan used


mst start fails with ConnectX-4 on ppc64le

$
0
0

Hi,

 

I'm trying to setup VFs using SRIOV on a ppc64le machine

 

$ lsb_release -a

No LSB modules are available.

Distributor ID: Ubuntu

Description:    Ubuntu 16.04.4 LTS

Release:        16.04

Codename:       xenial


$ uname -a

 

Linux p006n03 4.10.0-35-generic #39~16.04.1-Ubuntu SMP Wed Sep 13 08:59:44 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

 

$ lspci | grep Mellanox

0000:01:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

0040:01:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

 

First i installed MLNX_OFED driver as per steps: https://community.mellanox.com/docs/DOC-2688

Then i installed latest MFT (4.10.0) for ppc64le from here: http://www.mellanox.com/page/management_tools

 

Running "mst start" subsequently fails however

 

$ sudo mst start

Starting MST (Mellanox Software Tools) driver set

Loading MST PCI module - Success

Loading MST PCI configuration module - Success

Create devices

/usr/bin/mst: line 382: 13070 Segmentation fault      (core dumped) ${mbindir}/minit $fullname ${busdevfn} 88 92

cat: /dev/mst/mt4115_pci_cr0: No such file or directory

/usr/bin/mst: line 382: 13132 Segmentation fault      (core dumped) ${mbindir}/minit $fullname ${busdevfn} 88 92

cat: /dev/mst/mt4115_pci_cr1: No such file or directory

Unloading MST PCI module (unused) - Success

 

Unloading MST PCI configuration module (unused) - Success

 

What could be the reason for this error?

 

I ultimately want to enable VFs on the CX4 as per steps here: https://community.mellanox.com/docs/DOC-2386 but cannot proceed due to this error

Re: mst start fails with ConnectX-4 on ppc64le

Re: Yocto embedded build of rdma-core

$
0
0

The solution to this problem was to make use of the incorporated recipes in the updated openembedded build.  About a month ago, rdma-core was added to the mainlline tree.  We had been trying to get this to work ourselves by writing our own recipes.  Now that the code is integrated it just builds.

rxe driver does not support kernel ABI

$
0
0

Getting a small error when I try to do an rping test. I'm building rxe into kernel 4.16 and rdma-core using yocto on an Arria10 socfpga containing a dual core A53 ARM processor. I get the kernel modules and userland loaded:

 

root@arria10:~# lsmod | grep rxe
rdma_rxe 102400 0
ib_core 192512 6 rdma_rxe,ib_cm,rdma_cm,ib_uverbs,iw_cm,rdma_ucm

 

I can configure the rxe0 device but rxe_cfg is giving a strange error:

 

root@arria10:~# rxe_cfg
libibverbs: Warning: Driver rxe does not support the kernel ABI of 1 (supports 2 to 2) for device /sys/class/infiniband/rxe0
IB device 'rxe0' wasn't found
Name Link Driver Speed NMTU IPv4_addr RDEV RMTU
eth0 yes  st_gmac      1500 10.0.1.24 rxe0 (?)

 

Any hints on what this means, i.e. the kernel ABI error would be appreciated!

 

Thanks,

FM

Re: rxe driver does not support kernel ABI

$
0
0

After setting up the yocto build to include the various rdma-core modules according to yocto practices, this error went away.

Re: rxe driver does not support kernel ABI

$
0
0

Its back.  For some reason I keep getting this warning

libibverbs: Warning: Driver rxe does not support the kernel ABI of 1 (supports 2 to 2) for device /sys/class/infiniband/rxe0

Re: Connext-x3 roce mode

$
0
0

Karen,

 

Thanks for replying and ref doc.


Re: sr-iov and vxlan used

Re: mst start fails with ConnectX-4 on ppc64le

$
0
0

Hi Karen,

 

Thanks for your response. I do have the Advanced Toolchain Runtime installed.

 

$ sudo apt list --installed | grep advance-toolchain

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

advance-toolchain-at10.0-devel/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at10.0-mcore-libs/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at10.0-perf/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at10.0-runtime/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at7.1-devel/trusty,now 7.1-5 ppc64el [installed]

advance-toolchain-at7.1-mcore-libs/trusty,now 7.1-5 ppc64el [installed]

advance-toolchain-at7.1-perf/trusty,now 7.1-5 ppc64el [installed]

advance-toolchain-at7.1-runtime/trusty,now 7.1-5 ppc64el [installed]

 

I did the export as mentioned(libc.so.6 exists on my system) but still see the error

 

$ echo $LD_PRELOAD

/lib/powerpc64le-linux-gnu/libc.so.6

 

I still see the error however.

 

${mbindir}/minit from /usr/bin/mst gives a segmentation fault for some reason (as seen in the logs from my previous message), not sure why that happens

Re: mst start fails with ConnectX-4 on ppc64le

$
0
0

Thank you Sood,

Please open a support ticket with the details so we can further investigate.

You can open a ticket by sending us an email to support@mellanox.com

 

Regards,

Karen.

Re: "Priority trust-mode is not supported on your system"?

$
0
0

Hi,

 

Can you give more details on what you tried and what did you use ?

 

Thanks

Marc

Web interface error on SX6036

$
0
0

I am trying to setup a SX6036 VPI switch, previously used at another institute. I've configured the mgmt interface and can connect to the web UI, however it immediately gives the following error:

 

Internal Error

An internal error has occurred.

Your options from this point are:

See the logs for more details.

Return to the home page.

Retry the bad page which gave the error.

 

 

When I enable logging monitor and try to log in I see the following on the terminal:

 

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: web_include_template(), web_template.c:364, build 1: can't use empty string as operand of "!"

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: Error in template "status-logs" at line 545 of the generated TCL code

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: web_render_template(), web_template.c:226, build 1: Error code 14002 (assertion failed) returned

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: main(), rh_main.c:337, build 1: Error code 14002 (assertion failed) returned

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: Request handler failed with error code 14002: assertion failed

Jul 23 11:34:29 ib-switch httpd[4535]: [Mon Jul 23 11:34:29 2018] [error] [client ipremvd] Exited with error code 14002: assertion failed, referer: http://ip.removed./admin/launch?script=rh&template=failure&badpage=%2Fadmin%2Flaunch%3Fscript%3Drh%26template%3Dstatus-logs

 

 

Any idea as to check what may have failed and how to fix it?

 

regards

Andrew

Re: rxe driver does not support kernel ABI

$
0
0

I traced this to the function match_device() in libibverbs/init.c

 

There is a check for ABI versions:

 

if (sysfs_dev->abi_ver < ops->match_min_abi_version ||

            sysfs_dev->abi_ver > ops->match_max_abi_version) {

                fprintf(stderr, PFX

                        "Warning: Driver %s does not support the kernel ABI of %u (supports %u to %u) for device %s\n",

 

The variable sysfs_dev is being passed into this call by another routine called try_driver() which is called by try_drivers() which is called by try_all_drivers() which appears to be called by

ibverbs_get_device_list()

 

Does this help?

Re: rxe driver does not support kernel ABI

$
0
0

It appears that the abi version is stored here:

root@arria10:/sys/class/infiniband# cat rxe0/device/infiniband_verbs/uverbs0/abi_version

1

And this needs to be 2 according to the code...


How do I conifgure teaming in Server 2008 R2?

$
0
0

Hi All,

 

I have a couple of older Server 2008 R2 boxes that have ConnectX-3 Pro dual port cards in them.   I need to build LACP teams for my new network, but it doesn't appear that teaming exists within the Mellanox WinOF driver.  In Server 2008 R2 Microsoft Teaming didn't exist yet.

 

How am I supposed to configure these cards in LACP Teams?

 

Thanks

 

C

Re: rxe driver does not support kernel ABI

$
0
0

I went to kernel 4.17 and this went away.

Various ping programs segfaulting

$
0
0

I have a build of rdma-core in kernel 4.17 using yocto for an Altera Arria10 with a dual-core A53 ARM processor.  The system is build and rxe configures correctly, i.e. I can rxe_cfg start, rxe_cfg add eth0 and ibv_devices looks good:

 

root@arria10:~# rxe_cfg status

  Name  Link  Driver   Speed  NMTU  IPv4_addr  RDEV  RMTU

  eth0  yes   st_gmac         1500  10.0.1.28  rxe0  1024  (3)

root@arria10:~# ibv_devices

    device                 node GUID

    ------              ----------------

    rxe0                085697fffec1059b

root@arria10:~# ibv_devinfo rxe0

hca_id: rxe0

        transport:                      InfiniBand (0)

        fw_ver:                         0.0.0

        node_guid:                      0856:97ff:fec1:059b

        sys_image_guid:                 0000:0000:0000:0000

        vendor_id:                      0x0000

        vendor_part_id:                 0

        hw_ver:                         0x0

        phys_port_cnt:                  1

                port:   1

                        state:                  PORT_ACTIVE (4)

                        max_mtu:                4096 (5)

                        active_mtu:             1024 (3)

                        sm_lid:                 0

                        port_lid:               0

                        port_lmc:               0x00

                        link_layer:             Ethernet

 

This all looks good.  However, when I try to ping this machine against a PC running rdma-core, I'm getting some strange errors including a segfault when the Arria10 acts as server for udaddy.

 

root@arria10:~# udaddy -s 10.0.1.16

udaddy: starting client

[ 1883.526301] rdma_rxe: null vaddr

udaddy: connecting

failed to reg MR

udaddy: failed to create messages: -1

test complete

Segmentation faultrxe_mem_init_user

 

I traced the first error, rdma_rxe: null vaddr to rxe_mem_init_user() in <kernel>/drivers/infiniband/sw/rxe/rxe_mr.c  It appears that a page address, perhaps from a virtual to physical translation is failing.  Any thoughts on how to solve this?

 

Thanks,

FM

when using write op with more than 1024B(MTU) in softroce mode,the operation fail

$
0
0

when write message length is more than 1024B(mtu), it failed in softroce mode, pls help check why.

using the standard tool ib_write_lat to test: when ib_write_lat -s 1024 -n 5 when ib_write_lat -s 1025 -n 5, it fail.

my softroce version in in "Red Hat Enterprise Linux Server release 7.4 (Maipo)"

Is it a bug in softroce? Thanks!

Re: "Priority trust-mode is not supported on your system"?

$
0
0

Hi, Marc.

 

I installed MLNX_OFED_LINUX-4.1-1.0.2.0 on my server and used the provided tool "mlnx_qos" to set the trust mode for Connect-X 3 Pro.

The command is "mlnx_qos -i p4p1 --trust=dscp".

Then the result is "Priority trust mode is not supported on your system".

 

Thanks

Viewing all 6227 articles
Browse latest View live


Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>