Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6227 articles
Browse latest View live

Re: How to build openvswitch with dpdk for ConnectX-3 NICs?

$
0
0

Shared library should work but without combined

We had issues with Ubuntu, that were fixed in DPDK 2.2

What OS do you have and what is not working?


Re: How to build openvswitch with dpdk for ConnectX-3 NICs?

$
0
0

CentOS 7.1.

I'm trying to follow instructions from here - Features/vhost-user-ovs-dpdk - QEMU

with the mellanox pmd tarball from http://www.mellanox.com/downloads/Drivers/MLNX_DPDK-2.1_1.1.tar.gz

 

enabling these two parameters

 

+CONFIG_RTE_BUILD_SHARED_LIB=y

+CONFIG_RTE_BUILD_COMBINE_LIBS=y

 

causes build failure like this ...

 

== Build drivers/net/mlx4
  CC mlx4.o
  LD librte_pmd_mlx4.so.1.1
  INSTALL-LIB librte_pmd_mlx4.so.1.1
MLX4: Not supported in a combined shared library
make[6]: *** [all] Error 1
make[5]: *** [mlx4] Error 2
make[4]: *** [net] Error 2
make[3]: *** [drivers] Error 2
make[2]: *** [all] Error 2
make[1]: *** [x86_64-native-linuxapp-gcc_install] Error 2
make: *** [install] Error 2

RDMA_CM_EVENT_ADDR_ERROR when running in RoCE mode

$
0
0

I have developed a test client server application which uses the verbs library and seems to work well when I have my ConnectX-3 Pro cards configured to use Infiniband.

 

However, if I reconfigure the ports to use Ethernet mode and try to use roce v1 mode my client always fails with the same error whenever I try call rdma_resolve_addr(...) - it generates RDMA_CM_EVENT_ADDR_ERROR, error: -2 (ENOENT).

 

If I try use udaddy instead of my own application I see exactly the same error:

 

>strace -f -s 32 -x udaddy -s 192.168.0.100

...

open("/dev/infiniband/rdma_cm", O_RDWR|O_CLOEXEC) = 3

...

write(1, "udaddy: connecting\n", 19udaddy: connecting)    = 19

write(3,"\x15\x00\x00\x00\x10\x01\x00\x00\x00\x00\x00\x00\xd0\x07\x00\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., 280) = 280

write(3, "\x0c\x00\x00\x00\x08\x00\x48\x01\xa0\xbc\x4e\x2a\xff\x7f\x00\x00", 16) = 16

write(1, "udaddy: event: RDMA_CM_EVENT_ADD"..., 51udaddy: event: RDMA_CM_EVENT_ADDR_ERROR, error: -2) = 51

write(1, "test complete\n", 14test complete) = 14

write(3, "\x01\x00\x00\x00\x10\x00\x04\x00\x30\xc1\x4e\x2a\xff\x7f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 24) = 24

close(3) = 0

write(1, "return status -2\n", 17return status -2) = 17

shutdown(4, 2 /* send and receive */)   = 0

close(4) = 0

exit_group(-2)  = ?

 

The ENOENT error seems to be coming from the rdma_cm kernel module in response to the RDMA_USER_CM_CMD_RESOLVE_ADDR command which is written to /dev/infiniband/rdma_cm - see write(3,"\x15...

 

Looking briefly at the rdma_cm code the ENOENT error code typically seems to be returned when there is no matching entry found in the GID cache.

 

Is there something I should be doing on my system to ensure that the GID cache is populated?

 

The system is running RHEL6.6 with MLNX_OFED_LINUX-3.1-1.0.3-rhel6.6-x86_64 installed. 

 

Thanks.

 

-Ronnie

Re: Omni-Path vs. Mellanox

$
0
0

Hi Henry,

 

To overcome the performance limitations of today’s HPC systems we need an intelligent interconnect. The interconnect becomes a co-processor, offloading the CPU, increasing data center efficiency.

 

Intel Omni-Path is a no-offload and proprietary network product. The same old Pathscale “Infini-Path” (and QLogic “True-Scale”) product, running at higher network speed. Does not support RDMA, HPC offloads, cloud offloads or any other network offloads. It requires the CPU to handle all network operations, results in lower CPU efficiency (high overhead) ....

 

So who is Omni-Path good for?  Intel– it will require users to buy more CPUs to try and overcome lower data center efficiency.

And why does Intel push inferior network technology? Intel tries to show value versus their CPU competitors (ARM, Power, etc.)

 

Mellanox InfiniBand delivers leading performance over Omni-Path promises: higher message rate, lower latency, lower power consumption, and estimated 2X higher system performance and efficiency.

 

Mellanox EDR solution is robust, working, and delivering scalable performance. Omni-Path is not.

 

Thanks,

Ophir.

mlnx_add_kernel_support.sh ofa-kernel build failure on sles12

$
0
0

I have been trying to use mlnx_add_kernel_support.sh to make a tgz with support for a custom kernel based on sles12 and it fails with an error when trying to build ofa-kernel:

 

...

objcopy: '/tmp/mlnx_iso.8082/OFED_topdir/BUILDROOT/mlnx-ofa_kernel-3.0-OFED.3.0.2.0.0.1.gea32cb7.x86_64/home/ronnie/kernel/initrd_files_compiled/lib/modules/3.12.44-52.10.1.NK_SLES12/updates/compat/mlx_compat.ko': No such file

*** ERROR: same build ID in nonidentical files!

        /usr/src/ofa_kernel/default/compat/mlx_compat.ko

   and  /home/ronnie/kernel/initrd_files_compiled/lib/modules/3.12.4452.10.1.NK/updates/compat/mlx_compat.ko

error: Bad exit status from /var/tmp/rpm-tmp.MHzqCE (%install)

...

 

The error message seems to come from the error handling of function make_id_link() in find-debuginfo.sh which is used in RPM packaging.

 

This occurs with both MLNX_OFED_LINUX-3.0-2.0.1-sles12sp0-x86_64 and MLNX_OFED_LINUX-3.1-1.0.3-sles12sp0-x86_64.

 

The build machine is running:

Linux sles12-ronnie-dev-01 3.12.39-47-default #1 SMP Thu Mar 26 13:21:16 UTC 2015 (a901594) x86_64 x86_64 x86_64 GNU/Linux

 

The same procedure used to work with MLNX_OFED_LINUX-3.0-2.0.1-sles11sp3-x86_64 and our sles11sp3 kernel.

 

Is there a workaround I can use to get the build to complete?

 

-Ronnie

 

P.S. I saw an older thread with a similar problem which I replied to but it wasn't exactly the same and it was marked as answered but without a solution so I thought I should also ask separately - apologies if this wasn't the right thing to do.  The older thread is: Mellanox OFED 3.0 mlnx-ofa_kernel failed to build in sles12

Re: Omni-Path vs. Mellanox

Re: How to build openvswitch with dpdk for ConnectX-3 NICs?

$
0
0

Please  use

+CONFIG_RTE_BUILD_SHARED_LIB=y

+CONFIG_RTE_BUILD_COMBINE_LIBS=n

Re: How to build openvswitch with dpdk for ConnectX-3 NICs?


Re: Is it possible to send frames containing L2 CoS and/or L3 DSCP without getting involved in IB or RoCE considerations?

$
0
0

hi,

What cards/switches do you have? what OS, is this linux?

so, you don't want to use RDMA? just pure ethernet traffic with TCP let's say?

 

See here for example,

HowTo Run RoCE and TCP over L2 Enabled with PFC

 

Focus on the Web Services VLAN, and see the examples to enable it the traffic over priority 0, let's say, you can change the priority as well.

 

Ophir.

Re: HPC-X Licensing Cost

$
0
0

Hi Rahul,

     HPC-X software is completely free to download and use, there is no licensing costs.  This is part of the great benefit when choosing Mellanox; all of the supporting packages supporting our off-loading engines are included.  By taking advantage of Mellanox and  the offloading capabilities, your freeing up the CPU to do more meaningful computation on the applications, which in turn increases your overall efficiency.  HPC-X does not include a job scheduler, but there are several open source schedulers to choose from and work well with HPC-X as the primary tool-stack.

Re: flex nic not present on MT26468

$
0
0

Hi, of course I used this version the problem is that kernel 3.14 is not supported by this version, so i tried porting it (probelm on vlan tagging rx and link detection but flex nics OK) , but I prefer using latest driver either from linux-stable tree or mellanox zip, but here I have the problem described above.

Re: Infiniband SX6036G/SX6018F and QLogic HP BLc 4X QDR IB Switch

$
0
0

Well after some time, I've spoke with the HP people and they changed the Mezzanine cards from (QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 02)) to (Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)) and now it works, not at 3.2Gb/s (the throughput that I think QDR may get), but both ports come up and they are running fine. After the replacement of the mezzanine cards, also I do a firmware upgrade code and took it to the last available code from the HP support site.

 

$ ibstat

CA 'mlx4_0'

    CA type: MT26428

    Number of ports: 2

    Firmware version: 2.9.1530

    Hardware version: b0

    Node GUID: 0xf452140300dd3294

    System image GUID: 0xf452140300dd3297

    Port 1:

        State: Active

        Physical state: LinkUp

        Rate: 40

        Base lid: 32

        LMC: 0

        SM lid: 2

        Capability mask: 0x02510868

        Port GUID: 0xf452140300dd3295

        Link layer: InfiniBand

    Port 2:

        State: Active

        Physical state: LinkUp

        Rate: 40

        Base lid: 33

        LMC: 0

        SM lid: 2

        Capability mask: 0x02510868

        Port GUID: 0xf452140300dd3296

        Link layer: InfiniBand

Re: Is it possible to send frames containing L2 CoS and/or L3 DSCP without getting involved in IB or RoCE considerations?

$
0
0

The switch is SX6036.  The OS is Ubuntu 12.  The messages will primarily be large multicasts competing with messages of lower priority.  I had already looked at the document you cited, which is where I got the impression that everything presumed the use of RoCE and had difficulty determining what applied in its absence.

Re: Omni-Path vs. Mellanox

$
0
0

Hello all

 

Explain / comment on several answer:
1) why do not mellanox produces crystals with a large number of ports than 36? I mean switches with full-matrix link between ports. I think its good look than you want do other than fat tree topology.
Why number of ports is limited?

In Omnipath (as understood from the documentation), increased number of ports by creating a super ports, which are divided into 4 ports with a smaller bandwidth. Why Mellanox not go on this path?


2) Does Intel make a full vendor lock to omnipash + intel processors (include all controllers to CPU)?

If so, that will makemellanox?

Re: Is it possible to send frames containing L2 CoS and/or L3 DSCP without getting involved in IB or RoCE considerations?

$
0
0

Some additional questions have arisen from further experimentation with the existing (default) configuration:

 

·        Although the VLAN tag is removed by the switch, not only the priority seems to be ignored but also the DEI (drop-enable-indicator) flag, i.e., everything seems to be dropped equally, regardless of L2 priority or DEI.  Is there any configuration that supports DEI, or is it solely based on queue assignment?

 

·        I see references to ehternet 1/1 with little or no explanation.  Does this refer to eth1, queue 1?  The only reason I’m guessing that is that there are 4 queues, and I seem to remember having seen 1/1, ½, 1/3, and ¼.

 

·        I seem to remember having seen that two of the queues (1,4?) allow dropping and two (2,3?) don’t.  Is this by design or through configuration?  I don’t recall having seen anything about configuring this in particular.

 

·        Is there any advantage to using the web GUI in place of the CLI?  I don’t see that it clarifies anything or even corresponds to CLI operations in any obvious way.


Re: ConnectX4 and SRIOV : supporting VL ?

$
0
0

Hi Jerome David,

 

Sorry for the late reply.

 

It seems at this point only 1VL is supported with SR- IOV configured and enabled.

 

Thanks,

 

Vishal

Re: Is it possible to send frames containing L2 CoS and/or L3 DSCP without getting involved in IB or RoCE considerations?

$
0
0

I’ve enabled priorities 4 and 5 and still see no difference in the handling relative to priorities 0-3.

Re: Burn u-Boot mlnx-os to JFFS2 as .img file-

$
0
0

hi to all in this thread

 

ive successfully accessed the 1024MB on NAND ram on the switch

and found after a nandinfo, and nand bad cli

 

that the nand boot image has 3 bad blocks - which is why the switch wont boot

 

Devoce 0: NAND 1GiB 3,3V 8-bit, sector size is 128 KiB

 

Device 0 bad blocks:

039a0000

1068000

1424000

 

i have the commands to dump all the boot partitions and the NAND to TFTP server,  ill examine, and compate that image to the one i have for mlnx-os.  should be the same one.

 

then, format, copy the new .img to the switch (excluding the main u-boot one)  and then set the Uboot boot partion to the NAND address.  effectivley its a NAND hosted Boot image. AKA - RAM Disk Image.  As its 284MB file it should fit in fine (image-PPC_M460EX-3.4.3002.img)

 

with any luck it will work.

Ive also figured out the .img file is created from the u-boot utilities

 

host% ./tools/mkimage -A sh -O linux -T ramdisk -a 0x8C800000 -n "ST40 Linux ramdisk" -d initrd.img /export/ramdisk.ub

 

im going to use the same to de-compile the image file to check the memory addresses match.

 

fingers crossed.

 

 

Re: Burn u-Boot mlnx-os to JFFS2 as .img file-

Re: Burn u-Boot mlnx-os to JFFS2 as .img file-

$
0
0

it should work

ive mounted the .img file as JFFS2, and i can see all the data/files. on a linux host.  (looks like there are 3 partitions, 2 boot, 1 ram disc) will examine further b4 i attempt.

 

it wont affect the u-boot existing partitions as i can set the env boot to the ram disk.

 

the commands i had previously overlooked this and were attempting to write to the Flash which is only 16MB.

 

ive also found all of the u-boot memory addresses in all ram devices. and in the envprint command i can see buried deep references to

 

/dev/mtdblock0

/dev/mtdblock1

/dev/mtdblock2

/dev/mtdblock3

/dev/mtdblock4

 

which are the physical partitions on flash, NAND, etc

so its looking more positive.

Viewing all 6227 articles
Browse latest View live