Hi Samer,
Let's say that I'm using latest Centos 7 (currently v7.5 April 2018 built). What is the best number PF and VF should I use and why? Any pros and cons consideration for having too many or little number of PF and VF?
Best regards,
Hi Samer,
Let's say that I'm using latest Centos 7 (currently v7.5 April 2018 built). What is the best number PF and VF should I use and why? Any pros and cons consideration for having too many or little number of PF and VF?
Best regards,
Hi,
I suggest to review the MLNX_OFED user manual (latest version 4.4 supports CentOS 7.5)
You can refer to page 222 section 3.4.1 SR-IOV:
http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_User_Manual_v4_4.pdf
Thanks,
Samer
Hi Samer,
It seem that document valid for VPI cards, we talk about EN cards in this discussion. Even if it is valid for EN cards, I only find the compatibility related matter regarding to the number of configured PF and VF. I still need the performance related matter of that configuration.
Best regards,
Regarding the number as mentioned on the firmware of ConnectX-5 EN
http://www.mellanox.com/pdf/firmware/ConnectX5-FW-16_23_1020-release_notes.pdf
The maximum Virtual Functions (VF) per port is 64. Note: When increasing the number of VFs, the following limitations must be taken into consideration: server_total_bar_size >= (num_pfs)*(2log_pf_uar_bar_- size + 2log_vf_uar_bar_size*total_vfs) server_total_msix >= (num_pfs)*(num_pf_msix + num_vfs_msix *total_vfs)
Note: For the maximum number of VFs supported by your driver, please refer to your drivers' Release Notes or User Manual.
You can refer to the MLNX_EN driver as well :
http://www.mellanox.com/related-docs/prod_software/Mellanox_EN_for_Linux_User_Manual_v4_4.pdf
http://www.mellanox.com/related-docs/prod_software/Mellanox_EN_for_Linux_User_Manual_v4_4.pdf
Have checked and found that the WinOF v4.80.5000 (from 2014) is the latest driver (mlx4_bus) that supported Mellanox connectX-2 adapter along with Windows 7 Client (64 bit only) & Windows 8.1 Client (64 bit only). Windows 10 is of course not supported, so that probably explains why it worked fine for you in Win7 but failed in in Win10
More information on that can be found in "WinOF Download Center" --> Archive Versions
http://www.mellanox.com/page/products_dyn?product_family=32&mtag=windows_sw_drivers
What about for vSphere 6.7.
Googled around and no idea what its issue is with binary.
###
[root@x385004:/tmp] ./mlxup
-E- cannot use a string pattern on a bytes-like object
[root@x385004:/tmp]
###
We have MFT for ESXi 6.7 in the same page , you can use it . UM as bellow
http://www.mellanox.com/related-docs/MFT/MFT_user_manual_4_10_0.pdf
Hello everyone;
I am new to InfinBand and working on my final year project, in which at initial, I have to configure IPOIB and RDMA over InfiniBand. I have configured both of them in Oracle Linux and status of RDMA is "active" and it is "enabled" (see capture 1 and capture 2).
I tried to test the bandwidth and run the command "ib_write_bw --reprt_gbits --iters=100000 --bidirectional", and waited for client to connect. The result of average bandwidth came out to be around 20 Gbps (see capture 3).
Firstly, I want to know, whether my RDMA configuration is working fine or do I need to run other scripts or commands as well to verify RDMA ?
Secondly, if it is working fine, but I am getting average bandwidth of 20 Gbps. However, in actual InfiniBand provides around 40 Gbps link. I have also increased the number of iterations, but it didn't effect the average bandwidth.
I am utilizing one port of IB card on each system. So, is that reason of getting 20 Gbps bandwidth or does each of two port supports 40 Gbps link?
Regards
Hi Samer,
Do you mean that following statement on page 3 of http://www.mellanox.com/related-docs/prod_adapter_cards/PB_ConnectX-5_EN_Card.pdf is not achievable?
– SR-IOV: Up to 1K Virtual Functions
– SR-IOV: Up to 16 Physical Functions per host
If we do some calculation, above statement seem related to 64 VFs per PF (1K/16). Is it true?
Let's say it is true and there is only 1 PF I can be used and compatible with OS, should I use 64 VFs configuration for performance? How is about using lower VF numbers (32, 16, 8, 4, 2 or other) regarding to performance? When should I use small number, when should I use big number?
Best regards,
Hi Alex,
Thank you for posting your question on the Mellanox Community.
Based on the information provided, we cannot determine what the issue is you are experience.
Our recommendation is to reinstall with the latest WinOF driver as user Administrator. If the issue still appears after the re-install, collect a System-Snapshot from the node and open a Mellanox Support case.
The screenshot provided did not appear in the post.
You can download the latest driver through the following link -> http://www.mellanox.com/downloads/WinOF/MLNX_VPI_WinOF-5_50_50000_All_Win2016_x64.exe
Thanks and regards,
~Mellanox Technical Support
Hi all.
Executing an "ibcheckerrors" on a node, I see several errors reported beyond the thresholds.
I suppose that these threshold will be related with time, so in a fabric that have not executed ibclearerrors and ibclearcounters in 2 months, maybe 50 Symbol Errors on a port are not important althought the number is beyond the threshold (10).
My question is, where I can check what is normal or expected error counter, and what is the real threshold beyond that number I need to worry about them?
I can't find some document or link where explain what are bad errors, what threshold in what time must be bad, or some thing like this.
I saw this message How to test RDMA traffic congestion where there are some details about test and exepected errors. I'm looking for some thing like this, but explaining all the errors messages that ibdiagnet report, with the worry threshold in a frame time.
Thanks.
Suggesting to try and "passthrough" the Mellanox adapters to the VMs , where they can actually "inherit" Physical nic's performance capability. I'm sure you will then get much better performance between the virtual machines
Avi
We are not using Mellanox OFED driver. Instead, we are using Linux standard driver. So, in order to capture RoCEv2 traffic, we use port mirroring in a switch to copy all traffic between host and target to a monitoring PC. This PC has a Connect4X rNIC. It was running Ubuntu 17.10 with 4.13 kernel. I was able to run Wireshark to capture the RoCEv2 traffic.
I recently upgraded that PC to Ubuntu 18.04 with 4.15 kernel. After that, I can't capture RoCE traffic any more. I debug a bit and got these counters.
yao@Host2:~$ ethtool -S enp21s0f0 | grep rdma
rx_vport_rdma_unicast_packets: 3112973874
rx_vport_rdma_unicast_bytes: 1434773360288
tx_vport_rdma_unicast_packets: 362387261
tx_vport_rdma_unicast_bytes: 27309669154
So Mlx-5 driver drops all RDMA packets. Why is it doing that? Is there any configuration to change its behavior back to the old one (in Kernel 4.13)? I don't see such a configuration in Mellanox_OFED_Linux_User_Manual_v4_3.pdf.
Here's the driver info in the old kernel:
dotadmin@DavidLenovo:~$ modinfo mlx5_core
filename: /lib/modules/4.13.0-46-generic/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko
version: 5.0-0
license: Dual BSD/GPL
description: Mellanox Connect-IB, ConnectX-4 core driver
author: Eli Cohen <eli@mellanox.com>
srcversion: 79D72EC6EB494E762310F77
alias: pci:v000015B3d0000A2D3sv*sd*bc*sc*i*
alias: pci:v000015B3d0000A2D2sv*sd*bc*sc*i*
alias: pci:v000015B3d0000101Csv*sd*bc*sc*i*
alias: pci:v000015B3d0000101Bsv*sd*bc*sc*i*
alias: pci:v000015B3d0000101Asv*sd*bc*sc*i*
alias: pci:v000015B3d00001019sv*sd*bc*sc*i*
alias: pci:v000015B3d00001018sv*sd*bc*sc*i*
alias: pci:v000015B3d00001017sv*sd*bc*sc*i*
alias: pci:v000015B3d00001016sv*sd*bc*sc*i*
alias: pci:v000015B3d00001015sv*sd*bc*sc*i*
alias: pci:v000015B3d00001014sv*sd*bc*sc*i*
alias: pci:v000015B3d00001013sv*sd*bc*sc*i*
alias: pci:v000015B3d00001012sv*sd*bc*sc*i*
alias: pci:v000015B3d00001011sv*sd*bc*sc*i*
depends: devlink,ptp,mlxfw
intree: Y
name: mlx5_core
vermagic: 4.13.0-46-generic SMP mod_unload
signat: PKCS#7
signer:
sig_key:
sig_hashalgo: md4
parm: debug_mask:debug mask: 1 = dump cmd data, 2 = dump cmd exec time, 3 = both. Default=0 (uint)
parm: prof_sel:profile selector. Valid range 0 - 2 (uint)
Here's the driver info in the new kernel:
yao@Host2:~$ modinfo mlx5_core
filename: /lib/modules/4.15.0-36-generic/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko
version: 5.0-0
license: Dual BSD/GPL
description: Mellanox Connect-IB, ConnectX-4 core driver
author: Eli Cohen <eli@mellanox.com>
srcversion: C271CE9036D77E924A8E038
alias: pci:v000015B3d0000A2D3sv*sd*bc*sc*i*
alias: pci:v000015B3d0000A2D2sv*sd*bc*sc*i*
alias: pci:v000015B3d0000101Csv*sd*bc*sc*i*
alias: pci:v000015B3d0000101Bsv*sd*bc*sc*i*
alias: pci:v000015B3d0000101Asv*sd*bc*sc*i*
alias: pci:v000015B3d00001019sv*sd*bc*sc*i*
alias: pci:v000015B3d00001018sv*sd*bc*sc*i*
alias: pci:v000015B3d00001017sv*sd*bc*sc*i*
alias: pci:v000015B3d00001016sv*sd*bc*sc*i*
alias: pci:v000015B3d00001015sv*sd*bc*sc*i*
alias: pci:v000015B3d00001014sv*sd*bc*sc*i*
alias: pci:v000015B3d00001013sv*sd*bc*sc*i*
alias: pci:v000015B3d00001012sv*sd*bc*sc*i*
alias: pci:v000015B3d00001011sv*sd*bc*sc*i*
depends: devlink,ptp,mlxfw
retpoline: Y
intree: Y
name: mlx5_core
vermagic: 4.15.0-36-generic SMP mod_unload
signat: PKCS#7
signer:
sig_key:
sig_hashalgo: md4
parm: debug_mask:debug mask: 1 = dump cmd data, 2 = dump cmd exec time, 3 = both. Default=0 (uint)
parm: prof_sel:profile selector. Valid range 0 - 2 (uint)
So srcversion is different, even though version is the same.
But I tried MFT and it fails.. your correct that it seems to be the best documented and seems easy.. but that is when it failed and I started looking at the other tools.
##########
[root@x385004:/tmp] wget http://www.mellanox.com/downloads/MFT/vmware_6.5_native/nmst-4.10.0.104-1OEM.650.0.0.4598673.x86_64.vib
Connecting to www.mellanox.com (72.21.92.229:80)
nmst-4.10.0.104-1OEM 100% |***************************************************************************************************************************************************************| 19198 0:00:00 ETA
[root@x385004:/tmp]
[root@x385004:/tmp] esxcli software vib install -v nmst-4.10.0.104-1OEM.650.0.0.4598673.x86_64.vib
[VibDownloadError]
('nmst-4.10.0.104-1OEM.650.0.0.4598673.x86_64.vib', '/tmp/vib_n6wujfpe', "unknown url type: 'nmst-4.10.0.104-1OEM.650.0.0.4598673.x86_64.vib'")
url = nmst-4.10.0.104-1OEM.650.0.0.4598673.x86_64.vib
Please refer to the log file for more details.
[root@x385004:/tmp] wget http://www.mellanox.com/downloads/MFT/vmware_6.5_native/mft-4.10.0.104-10EM-650.0.0.4598673.x86_64.vib
Connecting to www.mellanox.com (72.21.92.229:80)
mft-4.10.0.104-10EM- 100% |***************************************************************************************************************************************************************| 35284k 0:00:00 ETA
[root@x385004:/tmp] esxcli software vib install -v /tmp/mft-4.10.0.104-10EM-650.0.0.4598673.x86_64.vib
[VibDownloadError]
('/tmp/mft-4.10.0.104-10EM-650.0.0.4598673.x86_64.vib', 'Bad VIB archive header')
url = /tmp/mft-4.10.0.104-10EM-650.0.0.4598673.x86_64.vib
Please refer to the log file for more details.
[root@x385004:/tmp]
#######
Hi,
I suggest to open a support case with Mellanox at support@mellanox.com asking for recommended performance tuning for such scenario.
Thanks,
Samer
The documentation says that ConnectX-5 inherits ConnectX-4's ability to support Mellanox Multi-Host and that needs to be enabled.
How does one enable it?
Particulars:
MCX556A-EDAT
MT_0000000009
ConnectX-5 Ex VPI adapter card
Mellanox has already released & published a new WinOF-2 driver v2.0.5100 that supports Windows 2019
see link: http://www.mellanox.com/page/products_dyn?product_family=32&mtag=windows_sw_drivers
Read carfully in the Releas-Note all the limitations & features that are supported within this new driver
http://www.mellanox.com/related-docs/prod_software/WinOF2_Release_Notes_v2.0.pdf
Hi Samer,
Do you mean that following statement on page 3 of http://www.mellanox.com/related-docs/prod_adapter_cards/PB_ConnectX-5_EN_Card.pdf is not achievable?
– SR-IOV: Up to 1K Virtual Functions
– SR-IOV: Up to 16 Physical Functions per host
If we do some calculation, above statement seem related to 64 VFs per PF (1K/16). Is it true?
Let's say it is true and there is only 1 PF I can be used and compatible with OS, should I use 64 VFs configuration for performance? How is about using lower VF numbers (32, 16, 8, 4, 2 or other) regarding to performance? When should I use small number, when should I use big number?
Best regards,