Re: How to build openvswitch with dpdk for ConnectX-3 NICs?

December 16, 2015, 7:11 am

≫ Next: Re: How to build openvswitch with dpdk for ConnectX-3 NICs?

≪ Previous: Re: How to build openvswitch with dpdk for ConnectX-3 NICs?

Shared library should work but without combined

We had issues with Ubuntu, that were fixed in DPDK 2.2

What OS do you have and what is not working?

↧

Re: How to build openvswitch with dpdk for ConnectX-3 NICs?

December 16, 2015, 7:49 am

≫ Next: RDMA_CM_EVENT_ADDR_ERROR when running in RoCE mode

≪ Previous: Re: How to build openvswitch with dpdk for ConnectX-3 NICs?

CentOS 7.1.

I'm trying to follow instructions from here - Features/vhost-user-ovs-dpdk - QEMU

with the mellanox pmd tarball from http://www.mellanox.com/downloads/Drivers/MLNX_DPDK-2.1_1.1.tar.gz

enabling these two parameters

+CONFIG_RTE_BUILD_SHARED_LIB=y

+CONFIG_RTE_BUILD_COMBINE_LIBS=y

causes build failure like this ...

== Build drivers/net/mlx4

  CC mlx4.o

  LD librte_pmd_mlx4.so.1.1

  INSTALL-LIB librte_pmd_mlx4.so.1.1

MLX4: Not supported in a combined shared library

make[6]: *** [all] Error 1

make[5]: *** [mlx4] Error 2

make[4]: *** [net] Error 2

make[3]: *** [drivers] Error 2

make[2]: *** [all] Error 2

make[1]: *** [x86_64-native-linuxapp-gcc_install] Error 2

make: *** [install] Error 2

↧

RDMA_CM_EVENT_ADDR_ERROR when running in RoCE mode

December 16, 2015, 7:50 am

≫ Next: Re: Omni-Path vs. Mellanox

≪ Previous: Re: How to build openvswitch with dpdk for ConnectX-3 NICs?

I have developed a test client server application which uses the verbs library and seems to work well when I have my ConnectX-3 Pro cards configured to use Infiniband.

However, if I reconfigure the ports to use Ethernet mode and try to use roce v1 mode my client always fails with the same error whenever I try call rdma_resolve_addr(...) - it generates RDMA_CM_EVENT_ADDR_ERROR, error: -2 (ENOENT).

If I try use udaddy instead of my own application I see exactly the same error:

>strace -f -s 32 -x udaddy -s 192.168.0.100

...

open("/dev/infiniband/rdma_cm", O_RDWR|O_CLOEXEC) = 3

...

write(1, "udaddy: connecting\n", 19udaddy: connecting) = 19

write(3,"\x15\x00\x00\x00\x10\x01\x00\x00\x00\x00\x00\x00\xd0\x07\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., 280) = 280

write(3, "\x0c\x00\x00\x00\x08\x00\x48\x01\xa0\xbc\x4e\x2a\xff\x7f\x00\x00", 16) = 16

write(1, "udaddy: event: RDMA_CM_EVENT_ADD"..., 51udaddy: event: RDMA_CM_EVENT_ADDR_ERROR, error: -2) = 51

write(1, "test complete\n", 14test complete) = 14

write(3, "\x01\x00\x00\x00\x10\x00\x04\x00\x30\xc1\x4e\x2a\xff\x7f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 24) = 24

close(3) = 0

write(1, "return status -2\n", 17return status -2) = 17

shutdown(4, 2 /* send and receive */) = 0

close(4) = 0

exit_group(-2) = ?

The ENOENT error seems to be coming from the rdma_cm kernel module in response to the RDMA_USER_CM_CMD_RESOLVE_ADDR command which is written to /dev/infiniband/rdma_cm - see write(3,"\x15...

Looking briefly at the rdma_cm code the ENOENT error code typically seems to be returned when there is no matching entry found in the GID cache.

Is there something I should be doing on my system to ensure that the GID cache is populated?

The system is running RHEL6.6 with MLNX_OFED_LINUX-3.1-1.0.3-rhel6.6-x86_64 installed.

Thanks.

-Ronnie

↧

Re: Omni-Path vs. Mellanox

December 16, 2015, 9:10 am

≫ Next: mlnx_add_kernel_support.sh ofa-kernel build failure on sles12

≪ Previous: RDMA_CM_EVENT_ADDR_ERROR when running in RoCE mode

Hi Henry,

To overcome the performance limitations of today’s HPC systems we need an intelligent interconnect. The interconnect becomes a co-processor, offloading the CPU, increasing data center efficiency.

Intel Omni-Path is a no-offload and proprietary network product. The same old Pathscale “Infini-Path” (and QLogic “True-Scale”) product, running at higher network speed. Does not support RDMA, HPC offloads, cloud offloads or any other network offloads. It requires the CPU to handle all network operations, results in lower CPU efficiency (high overhead) ....

So who is Omni-Path good for? Intel– it will require users to buy more CPUs to try and overcome lower data center efficiency.

And why does Intel push inferior network technology? Intel tries to show value versus their CPU competitors (ARM, Power, etc.)

Mellanox InfiniBand delivers leading performance over Omni-Path promises: higher message rate, lower latency, lower power consumption, and estimated 2X higher system performance and efficiency.

Mellanox EDR solution is robust, working, and delivering scalable performance. Omni-Path is not.

Thanks,

Ophir.

↧

mlnx_add_kernel_support.sh ofa-kernel build failure on sles12

December 16, 2015, 9:26 am

≫ Next: Re: Omni-Path vs. Mellanox

≪ Previous: Re: Omni-Path vs. Mellanox

I have been trying to use mlnx_add_kernel_support.sh to make a tgz with support for a custom kernel based on sles12 and it fails with an error when trying to build ofa-kernel:

...

objcopy: '/tmp/mlnx_iso.8082/OFED_topdir/BUILDROOT/mlnx-ofa_kernel-3.0-OFED.3.0.2.0.0.1.gea32cb7.x86_64/home/ronnie/kernel/initrd_files_compiled/lib/modules/3.12.44-52.10.1.NK_SLES12/updates/compat/mlx_compat.ko': No such file

*** ERROR: same build ID in nonidentical files!

/usr/src/ofa_kernel/default/compat/mlx_compat.ko

and /home/ronnie/kernel/initrd_files_compiled/lib/modules/3.12.4452.10.1.NK/updates/compat/mlx_compat.ko

error: Bad exit status from /var/tmp/rpm-tmp.MHzqCE (%install)

...

The error message seems to come from the error handling of function make_id_link() in find-debuginfo.sh which is used in RPM packaging.

This occurs with both MLNX_OFED_LINUX-3.0-2.0.1-sles12sp0-x86_64 and MLNX_OFED_LINUX-3.1-1.0.3-sles12sp0-x86_64.

The build machine is running:

Linux sles12-ronnie-dev-01 3.12.39-47-default #1 SMP Thu Mar 26 13:21:16 UTC 2015 (a901594) x86_64 x86_64 x86_64 GNU/Linux

The same procedure used to work with MLNX_OFED_LINUX-3.0-2.0.1-sles11sp3-x86_64 and our sles11sp3 kernel.

Is there a workaround I can use to get the build to complete?

-Ronnie

P.S. I saw an older thread with a similar problem which I replied to but it wasn't exactly the same and it was marked as answered but without a solution so I thought I should also ask separately - apologies if this wasn't the right thing to do. The older thread is: Mellanox OFED 3.0 mlnx-ofa_kernel failed to build in sles12

↧

Re: Omni-Path vs. Mellanox

December 16, 2015, 9:39 am

≫ Next: Re: How to build openvswitch with dpdk for ConnectX-3 NICs?

≪ Previous: mlnx_add_kernel_support.sh ofa-kernel build failure on sles12

Thanks Opher

Henry

↧

Re: How to build openvswitch with dpdk for ConnectX-3 NICs?

December 16, 2015, 12:07 pm

≫ Next: Re: How to build openvswitch with dpdk for ConnectX-3 NICs?

≪ Previous: Re: Omni-Path vs. Mellanox

Please use

+CONFIG_RTE_BUILD_SHARED_LIB=y

+CONFIG_RTE_BUILD_COMBINE_LIBS=n

↧

Re: How to build openvswitch with dpdk for ConnectX-3 NICs?

December 16, 2015, 3:01 pm

≫ Next: Re: Is it possible to send frames containing L2 CoS and/or L3 DSCP without getting involved in IB or RoCE considerations?

≪ Previous: Re: How to build openvswitch with dpdk for ConnectX-3 NICs?

I just tried that. But it fails to link with dpdk. Log here : http://pastebin.com/raw/0nXDQsgV

↧

Re: Is it possible to send frames containing L2 CoS and/or L3 DSCP without getting involved in IB or RoCE considerations?

December 16, 2015, 4:44 pm

≫ Next: Re: HPC-X Licensing Cost

≪ Previous: Re: How to build openvswitch with dpdk for ConnectX-3 NICs?

hi,

What cards/switches do you have? what OS, is this linux?

so, you don't want to use RDMA? just pure ethernet traffic with TCP let's say?

See here for example,

HowTo Run RoCE and TCP over L2 Enabled with PFC

Focus on the Web Services VLAN, and see the examples to enable it the traffic over priority 0, let's say, you can change the priority as well.

Ophir.

↧

Re: HPC-X Licensing Cost

December 16, 2015, 7:29 pm

≫ Next: Re: flex nic not present on MT26468

≪ Previous: Re: Is it possible to send frames containing L2 CoS and/or L3 DSCP without getting involved in IB or RoCE considerations?

Hi Rahul,

HPC-X software is completely free to download and use, there is no licensing costs. This is part of the great benefit when choosing Mellanox; all of the supporting packages supporting our off-loading engines are included. By taking advantage of Mellanox and the offloading capabilities, your freeing up the CPU to do more meaningful computation on the applications, which in turn increases your overall efficiency. HPC-X does not include a job scheduler, but there are several open source schedulers to choose from and work well with HPC-X as the primary tool-stack.

↧

Re: flex nic not present on MT26468

December 17, 2015, 1:38 am

≫ Next: Re: Infiniband SX6036G/SX6018F and QLogic HP BLc 4X QDR IB Switch

≪ Previous: Re: HPC-X Licensing Cost

Hi, of course I used this version the problem is that kernel 3.14 is not supported by this version, so i tried porting it (probelm on vlan tagging rx and link detection but flex nics OK) , but I prefer using latest driver either from linux-stable tree or mellanox zip, but here I have the problem described above.

↧

Re: Infiniband SX6036G/SX6018F and QLogic HP BLc 4X QDR IB Switch

December 17, 2015, 4:14 am

≫ Next: Re: Is it possible to send frames containing L2 CoS and/or L3 DSCP without getting involved in IB or RoCE considerations?

≪ Previous: Re: flex nic not present on MT26468

Well after some time, I've spoke with the HP people and they changed the Mezzanine cards from (QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 02)) to (Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)) and now it works, not at 3.2Gb/s (the throughput that I think QDR may get), but both ports come up and they are running fine. After the replacement of the mezzanine cards, also I do a firmware upgrade code and took it to the last available code from the HP support site.

$ ibstat

CA 'mlx4_0'

CA type: MT26428

Number of ports: 2

Firmware version: 2.9.1530

Hardware version: b0

Node GUID: 0xf452140300dd3294

System image GUID: 0xf452140300dd3297

Port 1:

State: Active

Physical state: LinkUp

Rate: 40

Base lid: 32

LMC: 0

SM lid: 2

Capability mask: 0x02510868

Port GUID: 0xf452140300dd3295

Link layer: InfiniBand

Port 2:

State: Active

Physical state: LinkUp

Rate: 40

Base lid: 33

LMC: 0

SM lid: 2

Capability mask: 0x02510868

Port GUID: 0xf452140300dd3296

Link layer: InfiniBand

↧

Re: Is it possible to send frames containing L2 CoS and/or L3 DSCP without getting involved in IB or RoCE considerations?

December 17, 2015, 6:13 am

≫ Next: Re: Omni-Path vs. Mellanox

≪ Previous: Re: Infiniband SX6036G/SX6018F and QLogic HP BLc 4X QDR IB Switch

The switch is SX6036. The OS is Ubuntu 12. The messages will primarily be large multicasts competing with messages of lower priority. I had already looked at the document you cited, which is where I got the impression that everything presumed the use of RoCE and had difficulty determining what applied in its absence.

↧

Re: Omni-Path vs. Mellanox

December 17, 2015, 11:32 am

≫ Next: Re: Is it possible to send frames containing L2 CoS and/or L3 DSCP without getting involved in IB or RoCE considerations?

≪ Previous: Re: Is it possible to send frames containing L2 CoS and/or L3 DSCP without getting involved in IB or RoCE considerations?

Hello all

Explain / comment on several answer:
1) why do not mellanox produces crystals with a large number of ports than 36? I mean switches with full-matrix link between ports. I think its good look than you want do other than fat tree topology.
Why number of ports is limited?

In Omnipath (as understood from the documentation), increased number of ports by creating a super ports, which are divided into 4 ports with a smaller bandwidth. Why Mellanox not go on this path?

2) Does Intel make a full vendor lock to omnipash + intel processors (include all controllers to CPU)?

If so, that will makemellanox?

↧

Re: Is it possible to send frames containing L2 CoS and/or L3 DSCP without getting involved in IB or RoCE considerations?

December 17, 2015, 1:02 pm

≫ Next: Re: ConnectX4 and SRIOV : supporting VL ?

≪ Previous: Re: Omni-Path vs. Mellanox

Some additional questions have arisen from further experimentation with the existing (default) configuration:

· Although the VLAN tag is removed by the switch, not only the priority seems to be ignored but also the DEI (drop-enable-indicator) flag, i.e., everything seems to be dropped equally, regardless of L2 priority or DEI. Is there any configuration that supports DEI, or is it solely based on queue assignment?

· I see references to ehternet 1/1 with little or no explanation. Does this refer to eth1, queue 1? The only reason I’m guessing that is that there are 4 queues, and I seem to remember having seen 1/1, ½, 1/3, and ¼.

· I seem to remember having seen that two of the queues (1,4?) allow dropping and two (2,3?) don’t. Is this by design or through configuration? I don’t recall having seen anything about configuring this in particular.

· Is there any advantage to using the web GUI in place of the CLI? I don’t see that it clarifies anything or even corresponds to CLI operations in any obvious way.

↧

Re: ConnectX4 and SRIOV : supporting VL ?

December 17, 2015, 1:39 pm

≫ Next: Re: Is it possible to send frames containing L2 CoS and/or L3 DSCP without getting involved in IB or RoCE considerations?

≪ Previous: Re: Is it possible to send frames containing L2 CoS and/or L3 DSCP without getting involved in IB or RoCE considerations?

Hi Jerome David,

Sorry for the late reply.

It seems at this point only 1VL is supported with SR- IOV configured and enabled.

Thanks,

Vishal

↧

Re: Is it possible to send frames containing L2 CoS and/or L3 DSCP without getting involved in IB or RoCE considerations?

December 17, 2015, 2:30 pm

≫ Next: Re: Burn u-Boot mlnx-os to JFFS2 as .img file-

≪ Previous: Re: ConnectX4 and SRIOV : supporting VL ?

I’ve enabled priorities 4 and 5 and still see no difference in the handling relative to priorities 0-3.

↧

Re: Burn u-Boot mlnx-os to JFFS2 as .img file-

December 17, 2015, 9:40 pm

≫ Next: Re: Burn u-Boot mlnx-os to JFFS2 as .img file-

≪ Previous: Re: Is it possible to send frames containing L2 CoS and/or L3 DSCP without getting involved in IB or RoCE considerations?

hi to all in this thread

ive successfully accessed the 1024MB on NAND ram on the switch

and found after a nandinfo, and nand bad cli

that the nand boot image has 3 bad blocks - which is why the switch wont boot

Devoce 0: NAND 1GiB 3,3V 8-bit, sector size is 128 KiB

Device 0 bad blocks:

039a0000

1068000

1424000

i have the commands to dump all the boot partitions and the NAND to TFTP server, ill examine, and compate that image to the one i have for mlnx-os. should be the same one.

then, format, copy the new .img to the switch (excluding the main u-boot one) and then set the Uboot boot partion to the NAND address. effectivley its a NAND hosted Boot image. AKA - RAM Disk Image. As its 284MB file it should fit in fine (image-PPC_M460EX-3.4.3002.img)

with any luck it will work.

Ive also figured out the .img file is created from the u-boot utilities

host% ./tools/mkimage -A sh -O linux -T ramdisk -a 0x8C800000 -n "ST40 Linux ramdisk" -d initrd.img /export/ramdisk.ub

im going to use the same to de-compile the image file to check the memory addresses match.

fingers crossed.

↧

Re: Burn u-Boot mlnx-os to JFFS2 as .img file-

December 17, 2015, 9:42 pm

≫ Next: Re: Burn u-Boot mlnx-os to JFFS2 as .img file-

≪ Previous: Re: Burn u-Boot mlnx-os to JFFS2 as .img file-

Good luck...:)

↧

Re: Burn u-Boot mlnx-os to JFFS2 as .img file-

December 17, 2015, 10:35 pm

≫ Next: Re: Is it possible to send frames containing L2 CoS and/or L3 DSCP without getting involved in IB or RoCE considerations?

≪ Previous: Re: Burn u-Boot mlnx-os to JFFS2 as .img file-

it should work

ive mounted the .img file as JFFS2, and i can see all the data/files. on a linux host. (looks like there are 3 partitions, 2 boot, 1 ram disc) will examine further b4 i attempt.

it wont affect the u-boot existing partitions as i can set the env boot to the ram disk.

the commands i had previously overlooked this and were attempting to write to the Flash which is only 16MB.

ive also found all of the u-boot memory addresses in all ram devices. and in the envprint command i can see buried deep references to

/dev/mtdblock0

/dev/mtdblock1

/dev/mtdblock2

/dev/mtdblock3

/dev/mtdblock4

which are the physical partitions on flash, NAND, etc

so its looking more positive.

↧