Hi Angelo,
If this issue is affecting your production, then please email Mellanox support at support@mellanox.com so the issue can be properly examined.
Thanks,
Christeen
Hi Angelo,
If this issue is affecting your production, then please email Mellanox support at support@mellanox.com so the issue can be properly examined.
Thanks,
Christeen
Hi
I have a Infiniband controller: Mellanox Technologies MT27600 [Connect-IB] in HP blades running CentOS6.8
ifconfig ib0 keeps giving the error "fetching interface information: Device not found"
I restarted the rdma service with no luck.
#service rdma status
Low level hardware support loaded:
mlx5_ib mlx5_core
Upper layer protocol modules:
ib_ipoib
User space access modules:
rdma_ucm ib_ucm ib_uverbs ib_umad
Connection management modules:
rdma_cm ib_cm iw_cm
Configured IPoIB interfaces: ib0
Currently active IPoIB interfaces: none
Thank you in advance
Appreciate that Christeen but as I mentioned, it's in our lab. It's a pain in the butt to reboot all the time and sometimes it runs for 3 or 4 days with no issues and other times we reboot it 4-5 times in one day. I'm pretty sure the answer is somewhere in the log but I am just unfamiliar with them and nothing is jumping out when I search through them for an error.
Hello,
I get an error when I try to install MLNX OFED for RHEL 7.3 Real time kernel.
For normal kernel RHEL 7.3 I can install OFED without any problems. What can be the problem? Btw. I installed kernel-devel and added –add-kernel-support but I got below ERROR after installation:
ERROR: Failed executing "MLNX_OFED_SRC-3.4-2.0.0.0/install.pl --tmpdir /tmp/MLNX_OFED_LINUX-3.4-2.0.0.0-3.10.0-514.2.2.rt56.424.el7.x86_64 --kernel-only --kernel 3.10.0-514.2.2.rt56.424.el7.x86_64 --kernel-sources /lib/modules/3.10.0-514.2.2.rt56.424.el7.x86_64/build --builddir /tmp/MLNX_OFED_LINUX-3.4-2.0.0.0-3.10.0-514.2.2.rt56.424.el7.x86_64/mlnx_iso.21610 --disable-kmp --force --build-only --distro rhel7.3"
ERROR: See /tmp/MLNX_OFED_LINUX-3.4-2.0.0.0-3.10.0-514.2.2.rt56.424.el7.x86_64/mlnx_ofed_iso.21610.log
Failed to build MLNX_OFED_LINUX for 3.10.0-514.2.2.rt56.424.el7.x86_64
I have Mellanox Connectx-3 VPI
Happy New Year to all!
With the clusters that I manage, I have to take care of many different Mellanox IB and Ethernet switches, HCAs and NICs. Some examples:
As a rule, I strive to keep their firmware always the most current. Here is an area where I would like to have Mellanox's input: what's the recommended tool to keep an HCA/NIC's firmware up to date? As far as I can see, there are four options:
Personally, I have come to the conclusion that mlxup is the simplest and the most flexible. But am I correct? Love to hear Mellanox colleagues' comments.
Regards,
Chin
Having managed HPC clusters (exclusively servers running RHEL/CentOS) in the national lab environment for quite a few years by now, and with extensive experience with both MLNX_OFED and its inbox counterparts, I would like to make a few suggestions:
Regards,
Chin
Hi Chin,
We definitely agree that in some cases not all the packages are needed and even redundant, for that we have the option to use a "slim" installation.
In order to use it you can run: "mlnxofedinstall -p", then a conf file called /tmp/ofed-all.conf will be created with all the packages included.
You can choose which packages you need and which not and then run the installation script as following:./mlnxofedinstall -c /tmp/ofed-all.conf and only the packages that you chose will be installed.
This is one option.
You have another option- to install the driver from source, meaning to install only the source RPMs from each release.
This can be done by using the installation script inside the tarball located under the /src directory (MLNX_OFED_SRC).
Here you also have the -p and -c flags for even slimmer installation.
Hope this information helps.
Regards,
Viki
Hi,
according to MS in windows server 2016 RTM (hyper-v) when I create vswitch with Switch Embedded Teaming I should be able to use SR-IOV for VMs.
Is it currently supported with ConnectX-4 and WinOF-2 ? (v1.50 at this time).
Hi Viki,
Thanks for sharing the useful tip. Much obliged. Learned some new things in a New Year
Best,
Chin
The solution was to remove MLNX_OFED and use the distribution's drivers/kernel modules.
on my machine, perfquery -x returns 64-bit values for the port counters, but i am unable to determine where these counters are. e.g. the counter /sys/class/infiniband/mlx4_0/ports/1/counters/port_rcv_data is only a 32-bit value and is maxed out at 4294967295. according to mlx5 docs there should be a counters_ext directory but that is not present on my system. is there a way to enable that with mlx4 or how am i to get the correct value?
I have a ConnectX-3 dual port. (HP branded)
When I open Device Manager > System Devices > Mellanox NIC > Port Protocol
Only Port 1 is available to change between IB, ETH and AUTO. Port 2 is greyed out.
When we installed the NIC in the server, both ports were set to IB - We then changed both to ETH. But now I need to change it back, with no luck.
I have tried reinstalling the driver, changing the settings with MLXTOOL and restoring the NIC back to defaults with powershell.
Anyone know what to do?
Thanks for the response,
What kind of performance hit should I expect on an SX1024 due to packet fragmentation that happens during the inter-vlan-routing ?
This is a diagram of my current set up.
+----------------+
| Linux Router |
| ConnectX-3 |
| port 1 port 2 |
+----------------+
/ \
+---------------+ / \ +---------------+
| Host 1 | / A A \ | Host 2 |
| ConnectX-4-LX | / \ | ConnectX-4-LX |
| Port 1 |- -| Port 1 |
| Port 2 |----------------| Port 2 |
+---------------+ B +---------------+
The Linux router has the ConnectX-3 (not PRO) card in Ethernet mode and is using a breakout cable (port 1 only) to connect to the ConnectX-4-LX cards at 10 Gb as path 'A'. The second port of the ConnectX-4-LX cards are connected directly at 25 Gb as path 'B'. Host 1 & 2 are running CentOS 7.2 with 3.10.0-327.36.3.el7.x86_64 and OFED 3.4. Linux router is running CentOS 7.2 with 4.9.0 kernel.
Iser and RDMA works fine over path 'B' and path 'A' (in either bridge or router mode) and now I want to add latency and drop packets to understand the effects. I'm using tc and netem to add the latency into the path. When I add .5 ms of latency in both directions, iSER slows to a crawl, throws errors in dmesg and sometimes even causes the file system to go read only. If I set the latency back to zero then things clear up and full 10 Gb is achieved. Iperf performs the same with the latency set to 0 or .5 ms for each direction. We would like to get RoCE to work over high-latency high-bandwidth links. If someone has some ideas on how to resolve this issue, I'd love to hear them.
Commands run on the router server:
for i in 2 3; do tc qdisc change dev eth${i} root netem delay .5ms; done
# brctl show
bridge name bridge id STP enabled interfaces
rleblanc 8000.f452147ce541 no eth2
eth3
The iser target is a 100 GB RAM disk exported via iser. I format the disk on the initiator with ext4 and then run this fio command:
echo "3" > /proc/sys/vm/drop_caches; fio --rw=read --bs=4K --size=1G --numjobs=40 --name=worker.matt --group_reporting
I see these messages on the initiator:
[25863.623453] 00000000 00000000 00000000 00000000
[25863.628564] 00000000 00000000 00000000 00000000
[25863.633634] 00000000 00000000 00000000 00000000
[25863.638619] 00000000 08007806 250003c7 0b0190d3
[25863.643593] iser: iser_handle_wc: wr id ffffffffffffffff status 6 vend_err 78
[25863.651180] connection40:0: detected conn error (1011)
[25874.368881] mlx5_warn:mlx5_1:dump_cqe:257:(pid 0): dump error cqe
[25874.375619] 00000000 00000000 00000000 00000000
[25874.380690] 00000000 00000000 00000000 00000000
[25874.385712] 00000000 00000000 00000000 00000000
[25874.390693] 00000000 08007806 250003c8 0501ddd3
[25874.395681] iser: iser_handle_wc: wr id ffffffffffffffff status 6 vend_err 78
[25874.403283] connection40:0: detected conn error (1011)
[25923.829903] mlx5_warn:mlx5_1:dump_cqe:257:(pid 0): dump error cqe
[25923.836663] 00000000 00000000 00000000 00000000
[25923.841724] 00000000 00000000 00000000 00000000
[25923.846752] 00000000 00000000 00000000 00000000
[25923.851733] 00000000 08007806 250003c9 510134d3
[25923.856709] iser: iser_handle_wc: wr id ffffffffffffffff status 6 vend_err 78
[25923.864308] connection40:0: detected conn error (1011)
[25943.184313] mlx5_warn:mlx5_1:dump_cqe:257:(pid 0): dump error cqe
[25943.191079] 00000000 00000000 00000000 00000000
[25943.196208] 00000000 00000000 00000000 00000000
[25943.201287] 00000000 00000000 00000000 00000000
[25943.206281] 00000000 08007806 250003ca 1afdbdd3
[25943.211272] iser: iser_handle_wc: wr id ffffffffffffffff status 6 vend_err 78
[25943.218901] connection40:0: detected conn error (1011)
[25962.538633] mlx5_warn:mlx5_1:dump_cqe:257:(pid 0): dump error cqe
[25962.545396] 00000000 00000000 00000000 00000000
[25962.550475] 00000000 00000000 00000000 00000000
[25962.555551] 00000000 00000000 00000000 00000000
[25962.560533] 00000000 08007806 250003cb 21012ed3
[25962.565526] iser: iser_handle_wc: wr id ffffffffffffffff status 6 vend_err 78
[25962.573155] connection40:0: detected conn error (1011)
[25973.291038] mlx5_warn:mlx5_1:dump_cqe:257:(pid 0): dump error cqe
[25973.297861] 00000000 00000000 00000000 00000000
[25973.302978] 00000000 00000000 00000000 00000000
[25973.308025] 00000000 00000000 00000000 00000000
[25973.313014] 00000000 08007806 250003cc 1901d2d3
[25973.318004] iser: iser_handle_wc: wr id ffffffffffffffff status 6 vend_err 78
[25973.325601] connection40:0: detected conn error (1011)
[26039.955899] mlx5_warn:mlx5_1:dump_cqe:257:(pid 0): dump error cqe
[26039.962690] 00000000 00000000 00000000 00000000
[26039.967825] 00000000 00000000 00000000 00000000
[26039.972894] 00000000 00000000 00000000 00000000
[26039.977891] 00000000 08007806 250003cd 850172d3
[26039.982905] iser: iser_handle_wc: wr id ffffffffffffffff status 6 vend_err 78
[26039.990512] connection40:0: detected conn error (1011)
[26067.411753] mlx5_warn:mlx5_1:dump_cqe:257:(pid 0): dump error cqe
[26067.418598] 00000000 00000000 00000000 00000000
[26067.423733] 00000000 00000000 00000000 00000000
[26067.428832] 00000000 00000000 00000000 00000000
[26067.433826] 00000000 08007806 250003ce 092977d3
[26067.438818] iser: iser_handle_wc: wr id ffffffffffffffff status 6 vend_err 78
[26067.446462] connection40:0: detected conn error (1011)
There are no messages on the target server.
Hello,
I have two Windows servers, directly connected with Mellanox ConnectX-3 without switch infrastructure between.
Is this a supported scenario with functioning RDMA/RoCE?
If so, do I still need to implement Datacenter Bridging in Windows or is only required when switches are in play?
Starting from Windows 2012 and latter (including windows 2016 of course ) - all teaming drivers and support is now within Microsoft native OS NetLBFO
Mellanox is not involved whatsoever with providing module, packet etc...so it's all with MS to check whether CX4 or any other adapter adapter is in their support compatibility matrix
see https://technet.microsoft.com/en-us/library/jj130849.aspx
see also relevant NSDN documentation on that
Learn to Develop with Microsoft Developer Network | MSDN
Can anyone help in above issue?
Hi Oskar,
Can you provide the contents of the logfile which is mentioned in the log /tmp/MLNX_OFED_LINUX-3.4-2.0.0.0-3.10.0-514.2.2.rt56.424.el7.x86_64/mlnx_ofed_iso.21610.log?
The logfile name should contain *.rpmbuild.log
Thanks and regards,
~Martijn
Aviap,
SRIOV is not supported on SET team.
If a device PMA supports the extended port counters (which is your case), it depends on which kernel is being used. There were recent kernel changes to utilize the optional PortCountersExtended rather than the mandatory PortCounters. So either a recent kernel with these changes would be needed to see this or the relevant changes backported to some older kernel.