There is a guide on how to configure PTP on ONYX switches
Re: Using SN2100 or SN2700 as PTP master clock?
Re: MSX1012B MSX6012F
Thank you for your response. It really help !
Concurrent INFINIBAND multicast writers
Hi everyone!
I am working on a project in which I have a small set of servers with ConnectX 3 HCAs connected to an IS5030 switch.
No TCP, just IB.
Given either 1 or many multicast groups, with one reader and one writer on each machine with the appropriate cpu affinity,
I observe the following behavior:
Only 1 writer in the cluster, everything else reads: the only increments in XmitWait is on the sending HCA that is just trying to get the multicast packets to the switch.
All of the IB counters on everything look great, even at many multiples of message rate compared the problem scenario below.
If I introduce just 1 more multicast writer into the mix and they are both at 5k msg/sec, XmitWait on the transmitting switch ports for the multicast group start growing. The more writers, the worse it gets.
A subnet manager is running on the switch. I have tried segregating the traffic into different VLs and turning on congestion control.
There is something about two machines generating multicast traffic to the same switch at any decent frequency.
I'm using 4k buffers but my message size is only 512 bytes.
Does anyone have any insight into what would be causing the congestion?
Re: Support for "INBOX drivers?" for 18.04/connected mode?
Hi Bill,
It would be great if you could run the following commands to validate if the parameter ipoib_enhanced is supported on your system with inbox driver:
1. #find /lib/modules -iname '*ib_ipoib*'
(Example output from host running Ubuntu 16.04)
/lib/modules/4.4.0-134-generic/kernel/drivers/infiniband/ulp/ipoib/ib_ipoib.ko
/lib/modules/4.4.0-116-generic/updates/dkms/ib_ipoib.ko
The first line denotes Inbox driver and second line denotes mlnx driver
2.#modinfo /lib/modules/4.4.0-134-generic/kernel/drivers/infiniband/ulp/ipoib/ib_ipoib.ko
Check which parameters are supported(You will see in output, lines that start with "parm:").
Compare the same by running modinfo with one which has mlnx driver
#modinfo /lib/modules/4.4.0-116-generic/updates/dkms/ib_ipoib.ko
Please share the outputs.
Re: ConnectX-5 error: Failed to write to /dev/nvme-fabrics: Invalid cross-device link
Please run # dmesg | grep "enabling port" - check if you get "....nvmet_rdma: enabling port...."
Re: Login directly to Enable Mode
Hello Donny Hariady,
Thank you for contacting Mellanox Global Support, this is Paul and your case is currently under my care. The answer to your question is NO from the switch. There is no knob on the switch to skip.
But you may rely on the auto login from SecureCRT or Xshell to archive it.
Mellanox on vSphere - Error Extracting File
Brand new card, new in server. It likely has firmware back rev issue.. but server is vSphere 6.7 so working through process to get it up and working for iSER
When I follow documentation to deploy driver (which it basically is working as the driver goes live) I am trying to install per the installation guide but getting this error:
############
[root@x395001:/tmp] esxcli software acceptance set --level=PartnerSupported
Host acceptance level changed to 'PartnerSupported'.
[root@x395001:/tmp] esxcli software sources profile list -d /tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip
[MetadataDownloadError]
Could not download from depot at zip:/tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip?index.xml, skipping (('zip:/tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip?index.xml', '', 'Error extracting index.xml from /tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip: "There is no item named \'index.xml\' in the archive"'))
url = zip:/tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip?index.xml
Please refer to the log file for more details.
[root@x395001:/tmp] lspci | grep Mellanox
0000:18:00.0 Network controller: Mellanox Technologies MT27800 Family [ConnectX-5] [vmnic8]
0000:18:00.1 Network controller: Mellanox Technologies MT27800 Family [ConnectX-5] [vmnic9]
[root@x395001:/tmp] esxcli software vib list | grep nmlx
nmlx4-core 3.16.11.6-1OEM.650.0.0.4598673 MEL VMwareCertified 2018-07-31
nmlx4-en 3.16.11.6-1OEM.650.0.0.4598673 MEL VMwareCertified 2018-07-31
nmlx4-rdma 3.16.11.6-1OEM.650.0.0.4598673 MEL VMwareCertified 2018-07-31
nmlx5-core 4.16.12.12-1OEM.650.0.0.4598673 MEL VMwareCertified 2018-07-31
nmlx5-rdma 4.16.12.12-1OEM.650.0.0.4598673 MEL VMwareCertified 2018-07-31
[root@x395001:/tmp]
[root@x395001:/tmp] esxcli software sources profile list -d /tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip
[MetadataDownloadError]
Could not download from depot at zip:/tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip?index.xml, skipping (('zip:/tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip?index.xml', '', 'Error extracting index.xml from /tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip: "There is no item named \'index.xml\' in the archive"'))
url = zip:/tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip?index.xml
Please refer to the log file for more details.
[root@x395001:/tmp]
########
I downloaded it twice to validate it was not download issue. I tried two separate servers and same issue.
Windows Firmware - flint "no command found"
New fresh install windows 2016 with Mellanox MCX512A-ACAT
Make sure firmware is updated
Firmware Flash Card
http://www.mellanox.com/page/management_tools
PS C:\Windows\system32> cd 'C:\Program Files\Mellanox\WinMFT\' PS C:\Program Files\Mellanox\WinMFT> mst status MST devices: ------------
mt4119_pciconf0 PS C:\Program Files\Mellanox\WinMFT> mst status -v MST devices: ------------
mt4119_pciconf0 bus:dev.fn=1a:00.0
mt4119_pciconf0.1 bus:dev.fn=1a:00.1 PS C:\Program Files\Mellanox\WinMFT> mlxfwmanager -d mt4119_pciconf0 --query Querying Mellanox devices firmware ...
Device #1: ----------
Device Type: ConnectX5 Part Number: MCX512A-ACA_Ax Description: ConnectX-5 EN network interface card; 10/25GbE dual-port SFP28; PCIe3.0 x8; tall bracket; ROHS R6 PSID: MT_0000000080 PCI Device Name: mt4119_pciconf0 Base GUID: 98039b0300325eba Base MAC: 98039b325eba Versions: Current Available FW 16.22.1002 N/A PXE 3.5.0403 N/A UEFI 14.15.0019 N/A
Status: No matching image found
PS C:\ftp\Mellanox_iSER\fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin> dir
Directory: C:\ftp\Mellanox_iSER\fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin
Mode LastWriteTime Length Name ---- ------------- ------ ---- ------ 7/12/2018 8:57 AM 16777216 fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin
PS C:\ftp\Mellanox_iSER\fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin> flint.bat -d mt4119_pciconf0 -i .\fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin No command found. PS C:\ftp\Mellanox_iSER\fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin> flint No options found. copy C:\ftp\Mellanox_iSER\fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin C:\Program Files\Mellanox\WinMFT\ PS C:\Program Files\Mellanox\WinMFT> .\flint_ext.exe -d mt4119_pciconf0 -i .\fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin No command found. PS C:\Program Files\Mellanox\WinMFT> .\flint.bat -d mt4119_pciconf0 -i .\fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin No command found. PS C:\Program Files\Mellanox\WinMFT> |
I know the firmware is back rev. I try simple run of flint command per document. then ".exe" vs ".bat" then move firmware bin file into same directory as flint binary.. no change.. But when I run "flint" by itself it gives different response that I am missing options so "No command found" not helpful at all as to what is not being found.
thanks,
Re: Mellanox on vSphere - Error Extracting File
I think I see issue here... it is a zip within a zip.
[root@x385004:/tmp] unzip MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip
Archive: MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-8873266.zip
inflating: MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-offline_bundle-8873266.zip
inflating: doc/README.txt
inflating: doc/open_source_licenses_nmlx5-core_4.17.13.8-1OEM.670.0.0.8169922.txt
inflating: doc/release_note_nmlx5-core_4.17.13.8-1OEM.670.0.0.8169922.pdf
inflating: doc/open_source_licenses_nmlx5-rdma_4.17.13.8-1OEM.670.0.0.8169922.txt
inflating: doc/release_note_nmlx5-rdma_4.17.13.8-1OEM.670.0.0.8169922.pdf
inflating: source/driver_source_nmlx5-core_4.17.13.8-1OEM.670.0.0.8169922.tgz
inflating: source/driver_source_nmlx5-rdma_4.17.13.8-1OEM.670.0.0.8169922.tgz
[root@x385004:/tmp] esxcli software vib install -d /tmp/MLNX-NATIVE-ESX-ConnectX-4-5_4.17.13.8-10EM-670.0.0-offline_bundle-8873266.zip
Installation Result
Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
Reboot Required: true
VIBs Installed: MEL_bootbank_nmlx5-core_4.17.13.8-1OEM.670.0.0.8169922, MEL_bootbank_nmlx5-rdma_4.17.13.8-1OEM.670.0.0.8169922
VIBs Removed: MEL_bootbank_nmlx5-core_4.16.12.12-1OEM.650.0.0.4598673, MEL_bootbank_nmlx5-rdma_4.16.12.12-1OEM.650.0.0.4598673
VIBs Skipped:
[root@x385004:/tmp]
<sigh> Be nice if it was noted as such anywhere in readme. Or name the zip such that it needs unzip before scp to vsphere host.
Re: Windows Firmware - flint "no command found"
Flashing firmware on cards...so how many tools and paths do we have here?
we have flint (noted above).
we have "WinMFT64" and that did not work
then I found "mlxup" which downloaded for windows and ran just fine.. noted firmware level, found new firmware on internet and ran installation.
Windows
#########
PS C:\ftp\Mellanox_iSER> .\mlxup
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX5
Part Number: MCX512A-ACA_Ax
Description: ConnectX-5 EN network interface card; 10/25GbE dual-port SFP28; PCIe3.0 x8; tall bracket; ROHS R6
PSID: MT_0000000080
PCI Device Name: mt4119_pciconf0
Base GUID: 98039b0300325eba
Base MAC: 98039b325eba
Versions: Current Available
FW 16.22.1002 16.23.1020
PXE 3.5.0403 3.5.0504
UEFI 14.15.0019 14.16.0017
Status: Update required
---------
Found 1 device(s) requiring firmware update...
Perform FW update? [y/N]: y
Device #1: Updating FW ... Done
Restart needed for updates to take effect.
Log File: C:\Users\TEMP\AppData\Local\Temp\mlxup-20181015_141528_6900.log
PS C:\ftp\Mellanox_iSER>
#########
And Linux ..... ppcle and x86 just fine
#######
[root@l82471 ~]# ./mlxup
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX5
Part Number: MCX512A-ACA_Ax
Description: ConnectX-5 EN network interface card; 10/25GbE dual-port SFP28; PCIe3.0 x8; tall bracket; ROHS R6
PSID: MT_0000000080
PCI Device Name: 0002:01:00.0
Base GUID: 98039b0300325f82
Base MAC: 98039b325f82
Versions: Current Available
FW 16.22.1002 16.23.1020
PXE 3.5.0403 3.5.0504
UEFI 14.15.0019 14.16.0017
Status: Update required
---------
Found 1 device(s) requiring firmware update...
Perform FW update? [y/N]: y
#########
but...... failed miserably on vmware vsphere 6.7.. so still in firmware purgatory for that OS.
#######
[root@x395001:/tmp] ls
LenovoIMMLog.log pciList.txt
cimple_log_err_messages probe.session
fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin scratch
ipp sfcb
libmofc.log vmware-root
lspci.txt vmware_version.txt
mlxup wbem-vm-report.xml
[root@x395001:/tmp] mlxup
-sh: mlxup: not found
[root@x395001:/tmp] ./mlxup --query
-sh: ./mlxup: Permission denied
[root@x395001:/tmp] chmod +x mlxup
[root@x395001:/tmp] ls
LenovoIMMLog.log pciList.txt
cimple_log_err_messages probe.session
fw-ConnectX5-rel-16_23_1020-MCX512A-ACA_Ax-UEFI-14.16.17-FlexBoot-3.5.504.bin scratch
ipp sfcb
libmofc.log vmware-root
lspci.txt vmware_version.txt
mlxup wbem-vm-report.xml
[root@x395001:/tmp] ./mlxup
-E- cannot use a string pattern on a bytes-like object
[root@x395001:/tmp]
#######
Re: Symbol Errors
There is no need to calculate symbol errors - the symbol error counter is available to query on each physical IB port.
Re: rx-out-of-buffer
Were you able to find answers to your query? I have the same question
Re: RoCE - Cisco catalyst 4506 switch
Please refer to the below article:
https://community.mellanox.com/docs/DOC-3062 Cisco switches. To enable ECN and PFC
https://community.mellanox.com/docs/DOC-2855 The main page to configure the ROCE end to end.
Please refer to the NIC part, and combine the Cisco switch config, then you would have a way to do achieve it.
Re: rx-out-of-buffer
No. Even if Mellanox force-accepted two answers, they like good stats apparently.
If you have any clue, don't hesitate to share.
My fear is that they will not disclose this info, like any internals about the NICs. There is some internal buffers somewhere that we run out of. The queues "imissed" counters queried via DPDK clearly report 0 loss, so it's not the queues that lacks of buffers, or there is a bug in reporting if it is. Like those infinite flow table that work like magic, they probably won't say anything...
Re: Windows Firmware - flint "no command found"
Re: rx-out-of-buffer
Hmm. That's not fair.
On a lighter note, looks like question-accepted-stats and mlx5-ethtool-counter-stats -- both are misleading
Re: Setting up FDR infiniband
Hi ,
I suggest to contact Mellanox Sales department at sales@mellanox.com
They will answer all your questions.
Re: PTP synchronization in ConnectX-5
Could you share an additional details about what kind of the software you are using for PTP, how it is configured, the logs that shows the issue? Do you use Mellanox OFED drivers? What is the version?
IPOIB and RDMA verification over InfiniBand
Hello everyone;
I am new to InfinBand and working on my final year project, in which at initial, I have to configure IPOIB and RDMA over InfiniBand. I have configured both of them in Oracle Linux and status of RDMA is "active" and it is "enabled" (see capture 1 and capture 2).
I tried to test the bandwidth and run the command "ib_write_bw --reprt_gbits --iters=100000 --bidirectional", and waited for client to connect. The result of average bandwidth came out to be around 20 Gbps (see capture 3).
Firstly, I want to know, whether my RDMA configuration is working fine or do I need to run other scripts or commands as well to verify RDMA ?
Secondly, if it is working fine, but I am getting average bandwidth of 20 Gbps. However, in actual InfiniBand provides around 40 Gbps link. I have also increased the number of iterations, but it didn't effect the average bandwidth.
I am utilizing one port of IB card on each system. So, is that reason of getting 20 Gbps bandwidth or does each of two port supports 40 Gbps link ?
Regards
Re: ConnectX-5 EN SR-IOV max_vfs
Hi,
Please note that such question is related to the firmware and OS capabilities,
You can refer to the latest firmware of ConnectX-5 EN release notes page 21 :
http://www.mellanox.com/pdf/firmware/ConnectX5-FW-16_23_1020-release_notes.pdf
The current firmware support:
Reduced firmware’s memory consumption to increase the supported number of VFs per PF to up to 100.
There is not optimal recommendation, it always depends on the Server type capabilities with SR-IOV and the firmware of the adapter.
Thanks,
Samer