Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6227 articles
Browse latest View live

Re: SX6710 Keeps Shutting Down

$
0
0

Hi Angelo,

 

If this issue is affecting your production, then please email Mellanox support at support@mellanox.com so the issue can be properly examined.

 

Thanks,

Christeen


ib0: error fetching interface information: Device not found

$
0
0

Hi

 

I have a Infiniband controller: Mellanox Technologies MT27600 [Connect-IB] in HP blades running CentOS6.8

ifconfig ib0 keeps giving the error "fetching interface information: Device not found"

 

I restarted the rdma service with no luck.

#service rdma status

Low level hardware support loaded:

        mlx5_ib mlx5_core

 

Upper layer protocol modules:

        ib_ipoib

 

User space access modules:

        rdma_ucm ib_ucm ib_uverbs ib_umad

 

Connection management modules:

        rdma_cm ib_cm iw_cm

 

Configured IPoIB interfaces: ib0

 

Currently active IPoIB interfaces: none

 

Thank you in advance

Re: SX6710 Keeps Shutting Down

$
0
0

Appreciate that Christeen but as I mentioned, it's in our lab. It's a pain in the butt to reboot all the time and sometimes it runs for 3 or 4 days with no issues and other times we reboot it 4-5 times in one day. I'm pretty sure the answer is somewhere in the log but I am just unfamiliar with them and nothing is jumping out when I search through them for an error.

MLNX OFED error for RHEL 7.3 RT

$
0
0

Hello,

I get an error when I try to install MLNX OFED for RHEL 7.3 Real time kernel.

For normal kernel RHEL 7.3 I can install OFED without any problems. What can be the problem? Btw. I installed kernel-devel and added –add-kernel-support but I got below ERROR after installation:

 

ERROR: Failed executing "MLNX_OFED_SRC-3.4-2.0.0.0/install.pl --tmpdir /tmp/MLNX_OFED_LINUX-3.4-2.0.0.0-3.10.0-514.2.2.rt56.424.el7.x86_64 --kernel-only --kernel 3.10.0-514.2.2.rt56.424.el7.x86_64 --kernel-sources /lib/modules/3.10.0-514.2.2.rt56.424.el7.x86_64/build --builddir /tmp/MLNX_OFED_LINUX-3.4-2.0.0.0-3.10.0-514.2.2.rt56.424.el7.x86_64/mlnx_iso.21610 --disable-kmp --force --build-only --distro rhel7.3"

ERROR: See /tmp/MLNX_OFED_LINUX-3.4-2.0.0.0-3.10.0-514.2.2.rt56.424.el7.x86_64/mlnx_ofed_iso.21610.log

Failed to build MLNX_OFED_LINUX for 3.10.0-514.2.2.rt56.424.el7.x86_64

 

I have Mellanox Connectx-3 VPI

What is the recommended firmware updating tool for Mellanox HCAs and/or NICs?

$
0
0

Happy New Year to all!

 

With the clusters that I manage, I have to take care of many different Mellanox IB and Ethernet switches, HCAs and NICs.  Some examples:

  • MSB7700-ES2F EDR IB 1U switches
  • MCX455A-ECAT ConnectX(r)-4 VPI EDR IB and 100GbE
  • MCB194A-FCAT Connect-IB(r) HCA dual-port FDR 56Gb/s
  • MCX383A-FCNA ConnectX(r)-3 VPI Single-port FDR IB and 40GbE I/O cards
  • MSN2410-CB2F Spectrum 25G/100G 1U switches
  • MCX4121A-ACAT ConnectX(r)-4 Lx EN 25G PCIe 3x8 cards
  • [...]

 

As a rule, I strive to keep their firmware always the most current.  Here is an area where I would like to have Mellanox's input: what's the recommended tool to keep an HCA/NIC's firmware up to date?  As far as I can see, there are four options:

 

  1. If the MLNX_OFED is used (I have), then the mlnxofedinstall Perl script has a few options to update an adapter's firmware
  2. If sufficient inbox IB/RDMA packages are installed, (e.g. the RHEL/CentOS "Infiniband Support" group), thus the mstflint package is available, then it can be used.
  3. The Mellanox also provides the MFT suite.
  4. There is also this Mellanox mlxup.

 

Personally, I have come to the conclusion that mlxup is the simplest and the most flexible.  But am I correct?  Love to hear Mellanox colleagues' comments.

 

Regards,

 

Chin

A few suggestions to Mellanox MLNX_OFED team

$
0
0

Having managed HPC clusters (exclusively servers running RHEL/CentOS) in the national lab environment for quite a few years by now, and with extensive experience with both MLNX_OFED and its inbox counterparts, I would like to make a few suggestions:

 

  1. Although I like the MLNX_OFED's enhancements and the fact that it's a bundle of "everything", but the need to rebuild the RPMs for even minor OS updates is a bore!   As such, on most of the servers that I manage (and my colleagues too), we still use inbox packages for IB-enabled servers.
  2. On servers that I manage, I tend to install minimum number of packagesThis practice is simply logical: the less are installed, the less can go wrong.  The "whole sale install" approach coded in mlxofedinstall thus contradicts this best practice. I must get into the code to "defeat" it. BTW, I do the same to RHEL/CentOS - NEVER do a yum -y groupinstall "Infiniband Support". The majority of the packages are useless to a typical IB-enabled server!  Example: why you need libcxgb3 and libcxgb4 on IB-enabled servers that use only Mellanox IB HCAs?  Is it a good will gesture to Cheliso?
  3. The MLNX_OFED's mlxofedinstall Perl scripts enhances the /etc/security/limits.conf directly, rather than putting a snippet in /etc/security/limits.d subdirectory.  Sure this is minor, but if the kernel developers have decided the best practice is to use a snippets (/etc/sudoers.d, /etc/sysctl.d come to my mind, and many more), why not follow this convention?

 

Regards,

 

Chin

Re: A few suggestions to Mellanox MLNX_OFED team

$
0
0

Hi Chin,

 

We definitely agree that in some cases not all the packages are needed and even redundant, for that we have the option to use a "slim" installation.

In order to use it you can run: "mlnxofedinstall -p", then a conf file called /tmp/ofed-all.conf will be created with all the packages included.

You can choose which packages you need and which not and then run the installation script as following:./mlnxofedinstall -c /tmp/ofed-all.conf and only the packages that you chose will be installed.

This is one option.

You have another option- to install the driver from source, meaning to install only the source RPMs from each release.

This can be done by using the installation script inside the tarball located under the /src directory (MLNX_OFED_SRC).

Here you also have the -p and -c flags for even slimmer installation.

 

Hope this information helps.

 

Regards,

Viki

Win server 2016 Switch Embedded Teaming (SET) and SR-IOV

$
0
0

Hi,

 

according to MS in windows server 2016 RTM (hyper-v) when I create vswitch with Switch Embedded Teaming I should be able to use SR-IOV for VMs.

Is it currently supported with ConnectX-4 and WinOF-2 ? (v1.50 at this time).


Re: A few suggestions to Mellanox MLNX_OFED team

$
0
0

Hi Viki,

 

Thanks for sharing the useful tip.  Much obliged. Learned some new things in a New Year

 

Best,

 

Chin

Re: "Protocol not supported" when trying to add rdma to nfs portlist

$
0
0

The solution was to remove MLNX_OFED and use the distribution's drivers/kernel modules.

Re: HCA extended port counters

$
0
0

on my machine, perfquery -x returns 64-bit values for the port counters, but i am unable to determine where these counters are.  e.g. the counter /sys/class/infiniband/mlx4_0/ports/1/counters/port_rcv_data is only a 32-bit value and is maxed out at 4294967295.  according to mlx5 docs there should be a counters_ext directory but that is not present on my system.  is there a way to enable that with mlx4 or how am i to get the correct value?

Cannot change port type on one port

$
0
0

I have a ConnectX-3 dual port. (HP branded)

When I open Device Manager > System Devices > Mellanox NIC > Port Protocol

Only Port 1 is available to change between IB, ETH and AUTO. Port 2 is greyed out.

 

When we installed the NIC in the server, both ports were set to IB - We then changed both to ETH. But now I need to change it back, with no luck.

 

I have tried reinstalling the driver, changing the settings with MLXTOOL and restoring the NIC back to defaults with powershell.

 

Anyone know what to do?

Re: Multiple MTU in routed vlans

$
0
0

Thanks for the response,

What kind of performance hit should I expect on an SX1024 due to packet fragmentation that happens during the inter-vlan-routing ?

ConnectX-4 LX RoCE does not like latency

$
0
0

This is a diagram of my current set up.

 

                +----------------+

                |  Linux Router  |

                |   ConnectX-3   |

                | port 1  port 2 |

                +----------------+

                     /      \

+---------------+   /        \   +---------------+

|    Host 1     |  / A      A \  |    Host 2     |

| ConnectX-4-LX | /            \ | ConnectX-4-LX |

|        Port 1 |-              -| Port 1        |

|        Port 2 |----------------| Port 2        |

+---------------+        B       +---------------+

 

The Linux router has the ConnectX-3 (not PRO) card in Ethernet mode and is using a breakout cable (port 1 only) to connect to the ConnectX-4-LX cards at 10 Gb as path 'A'. The second port of the ConnectX-4-LX cards are connected directly at 25 Gb as path 'B'. Host 1 & 2 are running CentOS 7.2 with 3.10.0-327.36.3.el7.x86_64 and OFED 3.4. Linux router is running CentOS 7.2 with 4.9.0 kernel.

 

Iser and RDMA works fine over path 'B' and path 'A' (in either bridge or router mode) and now I want to add latency and drop packets to understand the effects. I'm using tc and netem to add the latency into the path. When I add .5 ms of latency in both directions, iSER slows to a crawl, throws errors in dmesg and sometimes even causes the file system to go read only. If I set the latency back to zero then things clear up and full 10 Gb is achieved. Iperf performs the same with the latency set to 0 or .5 ms for each direction. We would like to get RoCE to work over high-latency high-bandwidth links. If someone has some ideas on how to resolve this issue, I'd love to hear them.

 

Commands run on the router server:

for i in 2 3; do tc qdisc change dev eth${i} root netem delay .5ms; done

 

# brctl show
bridge name     bridge id               STP enabled     interfaces
rleblanc                8000.f452147ce541       no              eth2
                                                        eth3

 

The iser target is a 100 GB RAM disk exported via iser. I format the disk on the initiator with ext4 and then run this fio command:

echo "3" > /proc/sys/vm/drop_caches; fio --rw=read --bs=4K --size=1G --numjobs=40 --name=worker.matt --group_reporting

 

I see these messages on the initiator:

[25863.623453] 00000000 00000000 00000000 00000000

[25863.628564] 00000000 00000000 00000000 00000000

[25863.633634] 00000000 00000000 00000000 00000000

[25863.638619] 00000000 08007806 250003c7 0b0190d3

[25863.643593] iser: iser_handle_wc: wr id ffffffffffffffff status 6 vend_err 78

[25863.651180]  connection40:0: detected conn error (1011)

[25874.368881] mlx5_warn:mlx5_1:dump_cqe:257:(pid 0): dump error cqe

[25874.375619] 00000000 00000000 00000000 00000000

[25874.380690] 00000000 00000000 00000000 00000000

[25874.385712] 00000000 00000000 00000000 00000000

[25874.390693] 00000000 08007806 250003c8 0501ddd3

[25874.395681] iser: iser_handle_wc: wr id ffffffffffffffff status 6 vend_err 78

[25874.403283]  connection40:0: detected conn error (1011)

[25923.829903] mlx5_warn:mlx5_1:dump_cqe:257:(pid 0): dump error cqe

[25923.836663] 00000000 00000000 00000000 00000000

[25923.841724] 00000000 00000000 00000000 00000000

[25923.846752] 00000000 00000000 00000000 00000000

[25923.851733] 00000000 08007806 250003c9 510134d3

[25923.856709] iser: iser_handle_wc: wr id ffffffffffffffff status 6 vend_err 78

[25923.864308]  connection40:0: detected conn error (1011)

[25943.184313] mlx5_warn:mlx5_1:dump_cqe:257:(pid 0): dump error cqe

[25943.191079] 00000000 00000000 00000000 00000000

[25943.196208] 00000000 00000000 00000000 00000000

[25943.201287] 00000000 00000000 00000000 00000000

[25943.206281] 00000000 08007806 250003ca 1afdbdd3

[25943.211272] iser: iser_handle_wc: wr id ffffffffffffffff status 6 vend_err 78

[25943.218901]  connection40:0: detected conn error (1011)

[25962.538633] mlx5_warn:mlx5_1:dump_cqe:257:(pid 0): dump error cqe

[25962.545396] 00000000 00000000 00000000 00000000

[25962.550475] 00000000 00000000 00000000 00000000

[25962.555551] 00000000 00000000 00000000 00000000

[25962.560533] 00000000 08007806 250003cb 21012ed3

[25962.565526] iser: iser_handle_wc: wr id ffffffffffffffff status 6 vend_err 78

[25962.573155]  connection40:0: detected conn error (1011)

[25973.291038] mlx5_warn:mlx5_1:dump_cqe:257:(pid 0): dump error cqe

[25973.297861] 00000000 00000000 00000000 00000000

[25973.302978] 00000000 00000000 00000000 00000000

[25973.308025] 00000000 00000000 00000000 00000000

[25973.313014] 00000000 08007806 250003cc 1901d2d3

[25973.318004] iser: iser_handle_wc: wr id ffffffffffffffff status 6 vend_err 78

[25973.325601]  connection40:0: detected conn error (1011)

[26039.955899] mlx5_warn:mlx5_1:dump_cqe:257:(pid 0): dump error cqe

[26039.962690] 00000000 00000000 00000000 00000000

[26039.967825] 00000000 00000000 00000000 00000000

[26039.972894] 00000000 00000000 00000000 00000000

[26039.977891] 00000000 08007806 250003cd 850172d3

[26039.982905] iser: iser_handle_wc: wr id ffffffffffffffff status 6 vend_err 78

[26039.990512]  connection40:0: detected conn error (1011)

[26067.411753] mlx5_warn:mlx5_1:dump_cqe:257:(pid 0): dump error cqe

[26067.418598] 00000000 00000000 00000000 00000000

[26067.423733] 00000000 00000000 00000000 00000000

[26067.428832] 00000000 00000000 00000000 00000000

[26067.433826] 00000000 08007806 250003ce 092977d3

[26067.438818] iser: iser_handle_wc: wr id ffffffffffffffff status 6 vend_err 78

[26067.446462]  connection40:0: detected conn error (1011)

 

 

There are no messages on the target server.

ConnectX3 directly connected without switch

$
0
0

Hello,

I have two Windows servers, directly connected with Mellanox ConnectX-3 without switch infrastructure between.

Is this a supported scenario with functioning RDMA/RoCE?

If so, do I still need to implement Datacenter Bridging in Windows or is only required when switches are in play?


Re: Win server 2016 Switch Embedded Teaming (SET) and SR-IOV

$
0
0

Starting from Windows 2012 and latter (including windows 2016 of course ) - all teaming drivers and support is now within Microsoft native OS NetLBFO

Mellanox is not involved whatsoever with providing module, packet etc...so it's all with MS to check whether CX4 or any other adapter adapter is in their support compatibility matrix

see https://technet.microsoft.com/en-us/library/jj130849.aspx

see also relevant NSDN documentation on that
Learn to Develop with Microsoft Developer Network | MSDN

Re: MLNX OFED error for RHEL 7.3 RT

Re: MLNX OFED error for RHEL 7.3 RT

$
0
0

Hi Oskar,

 

Can you provide the contents of the logfile which is mentioned in the log /tmp/MLNX_OFED_LINUX-3.4-2.0.0.0-3.10.0-514.2.2.rt56.424.el7.x86_64/mlnx_ofed_iso.21610.log?

The logfile name should contain *.rpmbuild.log

 

Thanks and regards,

~Martijn 

Re: Win server 2016 Switch Embedded Teaming (SET) and SR-IOV

$
0
0

Aviap,

 

SRIOV is not supported on SET team.

Re: HCA extended port counters

$
0
0

If a device PMA supports the extended port counters (which is your case), it depends on which kernel is being used. There were recent kernel changes to utilize the optional PortCountersExtended rather than the mandatory PortCounters. So either a recent kernel with these changes would be needed to see this or the relevant changes backported to some older kernel.

Viewing all 6227 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>