I see there are RX FCS errors on those physical interfaces. FCS errors are indication of for CRC errors which are generally layer 1 issue caused by the faulty port on the device or bad cable

You could try the following and see if this helps

1. Try to reseat the cables

2. Replace the cable with know good working cable

If problem still exists, please open a case with us and we will help you to resolve this issue

Thanks,

Pratik

↧

Re: InfiniBand amber port led flashing

July 26, 2018, 3:02 pm

≫ Next: Re: Various ping programs segfaulting

≪ Previous: Re: error packets

Hi Ken,

Port LED Flashing Amber means one or more ports have received symbol errors.

Possible causes are:

• Bad cable

• Bad connection

• Bad connector

Check symbol error counters on the system UI to identify the ports. Replace the cable on these ports.

As you have already replaced the cable on this port. There are no more symbol errors received and you see the LED becoming solid green.

Thanks,

Pratik

↧

Re: Various ping programs segfaulting

July 27, 2018, 1:03 pm

≫ Next: Building kernel module with ib client (un)register functions

≪ Previous: Re: InfiniBand amber port led flashing

This turned out to be a nasty little bug. Turns out there is place where the rxe driver is registering memory that uses are area of memory that is not available in the ARM processor we are using. Here's the patch that made it work...

2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c

index 5c2684b..f2dc5a7 100644

--- a/drivers/infiniband/sw/rxe/rxe_mr.c

+++ b/drivers/infiniband/sw/rxe/rxe_mr.c

@@ -31,6 +31,7 @@

* SOFTWARE.

+#include <linux/highmem.h>

#include "rxe.h"

#include "rxe_loc.h"

@@ -94,7 +95,15 @@ static void rxe_mem_init(int access, struct rxe_mem *mem)

void rxe_mem_cleanup(struct rxe_pool_entry *arg)

{

struct rxe_mem *mem = container_of(arg, typeof(*mem), pelem);

- int i;

+ int i, entry;

+ struct scatterlist *sg;

+ if (mem->kmap_occurred) {

+ for_each_sg(mem->umem->sg_head.sgl, sg,

+ mem->umem->nmap, entry) {

+ kunmap(sg_page(sg));

+ }

if (mem->umem)

ib_umem_release(mem->umem);

@@ -200,12 +209,14 @@ int rxe_mem_init_user(struct rxe_dev *rxe, struct rxe_pd *pd, u64 start,

buf = map[0]->buf;

for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) {

- vaddr = page_address(sg_page(sg));

+ // vaddr = page_address(sg_page(sg));

+ vaddr = kmap(sg_page(sg));

if (!vaddr) {

pr_warn("null vaddr\n");

err = -ENOMEM;

goto err1;

}

+ mem->kmap_occurred = 1;

buf->addr = (uintptr_t)vaddr;

buf->size = BIT(umem->page_shift);

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h

index af1470d..9bd7eac 100644

--- a/drivers/infiniband/sw/rxe/rxe_verbs.h

+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h

@@ -343,6 +343,8 @@ struct rxe_mem {

u32 num_map;

struct rxe_map **map;

+ int kmap_occurred;

};

struct rxe_mc_grp {

2.7.4

The idea is that you need to use kmap()/kunmap() rather than page_address() to handle these memory regions that are being used by both the kernel and user memory to make this work on the ARM...

Thanks,

↧

Building kernel module with ib client (un)register functions

July 30, 2018, 8:10 am

≫ Next: Re: How to enable VF multi-queue for SR-IOV on KVM?

≪ Previous: Re: Various ping programs segfaulting

Written minimal code of a kernel module registering RDMA-client using two functions ib_register_client(), ib_unregister_client(). The compiled code with the source code can be downloaded from the repository: https://github.com/sSadin/rdma_core_init.git

The compilation is successful. However, the module isn't loading, it generates an error in the system log:

... rdma_init: disagrees about version of symbol ib_unregister_client

... rdma_init: Unknown symbol ib_unregister_client (err -22)

... rdma_init: disagrees about version of symbol ib_register_client

... rdma_init: Unknown symbol ib_register_client (err -22)

----------------------------------

Installed OS: Ubunto 16.04

@uname -r

4.4.114

Installed Mellanox software: MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu16.04-x86_64.tgz

with command:

@./mlnxofedinstall --add-kernel-support

After install, there is new catalogs:

/usr/src/mlnx-ofed-kernel-4.4/include

/usr/src/ofa_kernel/default/include

with includes. But in /usr/src/linux-headers-4.4.0-116/include have "old" versions of files.

----------------------------------

@modinfo rdma_core_init.ko

srcversion: 21C176F120C52D1ED6D19F1

depends: ib_core

vermagic: 4.4.114

----------------------------------

@modinfo ib_core

filename: /lib/modules/4.4.114/updates/dkms/ib_core.ko

description: core kernel InfiniBand API

srcversion: A1112DAE0CC4C253540C773

depends: mlx_compat

vermagic: 4.4.114

Note: if open the generated file rdma_init.mod.ko:

{ 0x51b43427, __VMLINUX_SYMBOL_STR(ib_register_client) },

and i open file ib_core.ko from path: /lib/modules/4.4.114/build/drivers/infiniband/core

CRC for this function is the same:

0000000051b43427 A __crc_ib_register_client

But command [modinfo ib_core] points to path: /lib/modules/4.4.114/updates/dkms, and CRC for this function is:

00000000b184c3d5 A __crc_ib_register_client

Q: what should I do to compile and load the module correctly?

↧

Re: How to enable VF multi-queue for SR-IOV on KVM?

July 30, 2018, 8:35 pm

≫ Next: Re: "Priority trust-mode is not supported on your system"?

≪ Previous: Building kernel module with ib client (un)register functions

Where to open a technical support？

↧

Re: "Priority trust-mode is not supported on your system"?

July 31, 2018, 12:25 am

≫ Next: why not just BUG_ON(!pci_channel_offline(dev->persist->pdev))

≪ Previous: Re: How to enable VF multi-queue for SR-IOV on KVM?

Hi,

What is your current system ? Distribution / Kernel

ConnectX-3 Pro

FW version ?

PSID ?

Can you try with the latest Mellanox OFED 4.4 ?

Maybe the p4p1 is not the mellanox interface ?

Maybe it is not configured as ethernet interface ?

Please check

Marc

↧

why not just BUG_ON(!pci_channel_offline(dev->persist->pdev))

July 31, 2018, 4:05 am

≫ Next: Re: "Priority trust-mode is not supported on your system"?

≪ Previous: Re: "Priority trust-mode is not supported on your system"?

diff --git a/drivers/net/ethernet/mellanox/mlx4/catas.c b/drivers/net/ethernet/mellanox/mlx4/catas.c

index 715de8a..e866082 100644

--- a/drivers/net/ethernet/mellanox/mlx4/catas.c

+++ b/drivers/net/ethernet/mellanox/mlx4/catas.c

@@ -182,10 +182,17 @@ void mlx4_enter_error_state(struct mlx4_dev_persistent *persist)

err = mlx4_reset_slave(dev);

else

err = mlx4_reset_master(dev);

- BUG_ON(err != 0);

+ if (!err)

+ mlx4_err(dev, "device was reset successfully\n");

+ else

+ /* EEH could have disabled the PCI channel during reset. That's

+ * recoverable and the PCI error flow will handle it.

+ */

+ if (!pci_channel_offline(dev->persist->pdev))

+ BUG_ON(1);

dev->persist->state |= MLX4_DEVICE_STATE_INTERNAL_ERROR;

- mlx4_err(dev, "device was reset successfully\n");

mutex_unlock(&persist->device_state_mutex);

↧

Re: "Priority trust-mode is not supported on your system"?

July 31, 2018, 7:04 am

≫ Next: Re: "Priority trust-mode is not supported on your system"?

≪ Previous: why not just BUG_ON(!pci_channel_offline(dev->persist->pdev))

Hi, Marc.

The system information is as following:

The RNIC information is as following:

And I upgrade the OFED to 4.4, the result is the same.

I have checked the interface parameter and it's right.

Thanks.

↧

Re: "Priority trust-mode is not supported on your system"?

July 31, 2018, 7:54 am

≫ Next: Re: "Priority trust-mode is not supported on your system"?

≪ Previous: Re: "Priority trust-mode is not supported on your system"?

Can you show me ibdev2netdev output

Can u also try

mlnx_qos -i <interface>

Thanks

Marc

↧

Re: "Priority trust-mode is not supported on your system"?

July 31, 2018, 8:06 am

≫ Next: Re: "Priority trust-mode is not supported on your system"?

≪ Previous: Re: "Priority trust-mode is not supported on your system"?

Hi, Marc

Ah...

Sorry, after upgrading OFED to 4.4, the trust mode can be set on RNIC.

But there is another error message: "Buffers commands are not supported on your system".

Thanks.

↧

Re: "Priority trust-mode is not supported on your system"?

July 31, 2018, 8:16 am

≫ Next: Re: "Priority trust-mode is not supported on your system"?

≪ Previous: Re: "Priority trust-mode is not supported on your system"?

Hi,

Can you make the try to modify the buffer size and send me the output.

ibdev2devnet also , please.

Marc

↧

Re: "Priority trust-mode is not supported on your system"?

July 31, 2018, 8:28 am

≫ Next: Re: "Priority trust-mode is not supported on your system"?

≪ Previous: Re: "Priority trust-mode is not supported on your system"?

Hi, Marc

The result is as following:

It seems that the buffer commands is a new feature for PFC in OFED 4.4.

I checked the OFED 4.3 and didn't see this option for mlnx_qos.

Thanks.

↧

Re: "Priority trust-mode is not supported on your system"?

August 1, 2018, 6:41 am

≫ Next: Re: rx_fifo_errors and rx_dropped errors using VMA where CPU user less than 40%

≪ Previous: Re: "Priority trust-mode is not supported on your system"?

Hi,

After a first check on my card ConnectX-3, I got the same behavior

It seems to be supported only from ConnectX-4 and above.

If you want me to investigate it more, please open a ticket.

# mlnx_qos -i ens6

Buffers commands are not supported on your system

Marc

↧

Re: rx_fifo_errors and rx_dropped errors using VMA where CPU user less than 40%

August 1, 2018, 2:25 pm

≫ Next: Re: rx_fifo_errors and rx_dropped errors using VMA where CPU user less than 40%

≪ Previous: Re: "Priority trust-mode is not supported on your system"?

If you are seeing the same behaviour without VMA, why to complicate the problem? Start tuning the system and see if it helps. Adding more components will not help to troubleshoot. After tuning, I would suggest to check netstat -s/nstat and 'netstat -unp' to check the receive queue size.

The tuning guides are available from Mellanox site - Performance Tuning for Mellanox Adapters

You also might check what is the current number of send/receive queues configured on interface and try to limit it to 16

ethtool -L <IFS> rx 16 tx 16

↧

Re: rx_fifo_errors and rx_dropped errors using VMA where CPU user less than 40%

August 1, 2018, 4:35 pm

≫ Next: Re: MLNX+NVIDIA ASYNC GPUDirect - Segmentation fault: invalid permissions for mapped object running mpi with CUDA

≪ Previous: Re: rx_fifo_errors and rx_dropped errors using VMA where CPU user less than 40%

Hi Alkx,

Thanks for your reply. I've done all the performance tuning steps from the site you recommend. I tried VMA because I was expecting someone would say "Have you tried VMA?", also vma_stats seems to give more visibility of the various buffer sizes (and errors) than available via the kernel.

I monitor /proc/net/udp. With VMA off, it shows no drops and rarely more than a few MB in the UDP buffer (I think this equivalent to netstat -unp).

Thanks for the tip on ethtool -L. Below are my current settings. I'll have a play with it and see if things improve. I hadn't seen that before. I wonder why it isn't in the tuning guides?

Also:

- What's the difference between the 'rings' (ethtool -g) and 'channels' (ethtool -L)?

- Why does making the channels smaller help?

ban115@tethys:~$ /sbin/ethtool -g enp132s0

Ring parameters for enp132s0:

Pre-set maximums:

RX: 8192

RX Mini: 0

RX Jumbo: 0

TX: 8192

Current hardware settings:

RX: 8192

RX Mini: 0

RX Jumbo: 0

TX: 512

ban115@tethys:~$ /sbin/ethtool -L enp132s0

no channel parameters changed, aborting

current values: tx 8 rx 32 other 0 combined 0

↧

Re: MLNX+NVIDIA ASYNC GPUDirect - Segmentation fault: invalid permissions for mapped object running mpi with CUDA

August 2, 2018, 9:36 am

≫ Next: RoCEv2 PFC/ECN Issues

≪ Previous: Re: rx_fifo_errors and rx_dropped errors using VMA where CPU user less than 40%

Hi Jainkun yang,

Sorry for very late reply.

I am getting 7 micro seconds latency for the starting Bytes.

When i run osu_bw test, i am seeing that System memory is also getting used along with GPU Memory. These seems strange right. With GPUDirect RDMA, we should not see any system memory usage right? Am i missing something?

lspcu -tv output is for both the systems

+-[0000:80]-+-00.0-[81]--

| +-01.0-[82]--

| +-01.1-[83]--

| +-02.0-[84]--

| +-02.2-[85]----00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]

| +-03.0-[86]----00.0 NVIDIA Corporation Device 15f8

On Host Systems:

80:02.2 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 2 (rev 02) (prog-if 00 [Normal decode])

80:03.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 (rev 02) (prog-if 00 [Normal decode])

On Peer System:

80:02.2 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 2 (rev 01) (prog-if 00 [Normal decode])

80:03.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 (rev 01) (prog-if 00 [Normal decode])

Host CPU:

# lscpu

Architecture: x86_64

CPU op-mode(s): 32-bit, 64-bit

Byte Order: Little Endian

CPU(s): 72

On-line CPU(s) list: 0-71

Thread(s) per core: 2

Core(s) per socket: 18

Socket(s): 2

NUMA node(s): 1

Vendor ID: GenuineIntel

CPU family: 6

Model: 63

Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

Stepping: 2

CPU MHz: 1202.199

CPU max MHz: 3600.0000

CPU min MHz: 1200.0000

BogoMIPS: 4590.86

Virtualization: VT-x

L1d cache: 32K

L1i cache: 32K

L2 cache: 256K

L3 cache: 46080K

NUMA node0 CPU(s): 0-71

Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb invpcid_single retpoline kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts

Peer CPU:

# lscpu

Architecture: x86_64

CPU op-mode(s): 32-bit, 64-bit

Byte Order: Little Endian

CPU(s): 32

On-line CPU(s) list: 0-31

Thread(s) per core: 2

Core(s) per socket: 8

Socket(s): 2

NUMA node(s): 1

Vendor ID: GenuineIntel

CPU family: 6

Model: 79

Model name: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz

Stepping: 1

CPU MHz: 1201.019

CPU max MHz: 3000.0000

CPU min MHz: 1200.0000

BogoMIPS: 4191.23

Virtualization: VT-x

L1d cache: 32K

L1i cache: 32K

L2 cache: 256K

L3 cache: 20480K

NUMA node0 CPU(s): 0-31

Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt retpoline kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts

↧

RoCEv2 PFC/ECN Issues

August 3, 2018, 5:38 pm

≫ Next: Re: "Priority trust-mode is not supported on your system"?

≪ Previous: Re: MLNX+NVIDIA ASYNC GPUDirect - Segmentation fault: invalid permissions for mapped object running mpi with CUDA

We have two servers with ConnectX-4 100Ge cards and two Cisco C3232C switches with routing between them and are trying to get RoCEv2 routing through with PFC/ECN to provide the best performance during periods of congestion.

The funny thing is using base configuration with no other servers on the switches, we get terrible performance (1.6 Gbps) across the routed link using iSER when we are only using about 20 Gbps (1 iSER connection and test workload configuration). By using multiple iSER connections and PFC, we can get about 95 Gbps, so we know that the hardware is capable of the performance in routing mode. We can't understand why in the default case the performance is so bad. The fio test shows that a lot of IO happens, then there is none and it just cycles back and forth.

We would like to use both PFC and ECN for our configuration, but we are trying to validate that ECN will work without PFC and when we disable PFC, we can't test ECN most likely because of the above issue.

On the Cisco switches, we have policy maps that places our traffic with the DSCP markings into a group that has ECN enabled (I'm not a Cisco person, so I may not be getting the terminology quite right) and we can see the group counters on the Cisco incrementing. We don't ever see any packets marked with congestion, probably because the switch never sees any due to the above problem.

When we have the client set to 40 Gbps and do a read test with PFC, we get pause frames and great performance. We have the Cisco switches match the DSCP value and remark the COS for packets that traverse the router (interesting enough Cisco sends PFC pause frames on the routed link even though there are no VLANs configured. We captured it in wireshark, but with the adapters set to --trust=pcp, the performance in terrible, but --trust=dscp works well). The Cisco switches also show pause frame counters incrementing when we are 100g end to end. I'm not sure why it would be incrementing when there is no congestion.

We have done so many permutations of tests, that I may be getting fuzzy in some details. Here is a matrix of some tests that I can be sure of. This is all 100g end to end.

switch PFC mode (ports)	trust mode	pfc prio 3 enabled	skprio -> cos mapping	Result
static on/off	mlnx_qos --trust=X	mlnx_qos --pfc=0,0,0,X,0,0,0,0	ip link set rsY.Z type vlan egress 2:3
on	pcp	yes	yes	Good
on	pcp	yes	no	Good
on	pcp	no	yes	Bad
on	pcp	no	no	Bad
on	dscp	yes	yes	Good
on	dscp	yes	no	Good
on	dscp	no	yes	Bad
on	dscp	no	no	Bad
off	pcp	yes	yes	Bad
off	pcp	yes	no	Bad
off	pcp	no	yes	Bad
off	pcp	no	no	Bad
off	dscp	yes	yes	Bad
off	dscp	yes	no	Bad
off	dscp	no	yes	Bad
off	dscp	no	no	Bad

We are using OFED 4.4-1.0.0.0 on both nodes, one is CentOS 7.3, the other CentOS 7.4, running 4.9.116 and the firmware is 12.23.1000 on one card and 12.23.1020 on the other. In addition to the above matrix, we have only changed:

echo 26 > /sys/class/net/rs8bp2/ecn/roce_np/cnp_dscp

echo 106 > /sys/kernel/config/rdma_cm/mlx5_3/ports/1/default_roce_tos

If you have any ideas that we can try, we would appreciate it.

Thank you.

↧

Re: "Priority trust-mode is not supported on your system"?

August 5, 2018, 1:20 am

≫ Next: Problem installing MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu18.04-x86_64

≪ Previous: RoCEv2 PFC/ECN Issues

Thanks for your help!

↧