Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6227 articles
Browse latest View live

error packets

$
0
0

Hello, everybody!

I have errors on physical interfaces between mellanox switches, connected by MALGs.

Switches are connected by mellanox Active Cable (XLPPI). Errors appears one time in few days wich count about 1000.

You can see interface statistics in attached file.

What can be a reason of this errors?

May it be problems on queue?

 


Re: Assign a MAC to a VLAN

$
0
0

Hi,

What is the idea? Why you need it that way?

Re: error packets

$
0
0

Hi,

I see there are RX FCS errors on those physical interfaces. FCS errors are indication of for CRC errors which are generally layer 1 issue caused by the faulty port on the device or bad cable

You could try the following and see if this helps

1. Try to reseat the cables

2. Replace the cable with know good working cable

If problem still exists, please open a case with us and we will help you to resolve this issue

 

Thanks,

Pratik

Re: InfiniBand amber port led flashing

$
0
0

Hi Ken,

 

Port LED Flashing Amber means one or more ports have received symbol errors.

Possible causes are:

• Bad cable

• Bad connection

• Bad connector

Check symbol error counters on the system UI to identify the ports. Replace the cable on these ports.

As you have already replaced the cable on this port. There are no more symbol errors received and you see the LED becoming solid green.

 

Thanks,

Pratik

Re: Various ping programs segfaulting

$
0
0

This turned out to be a nasty little bug.  Turns out there is place where the rxe driver is registering memory that uses are area of memory that is not available in the ARM processor we are using.  Here's the patch that made it work...

 

2 files changed, 15 insertions(+), 2 deletions(-)

 

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c

index 5c2684b..f2dc5a7 100644

--- a/drivers/infiniband/sw/rxe/rxe_mr.c

+++ b/drivers/infiniband/sw/rxe/rxe_mr.c

@@ -31,6 +31,7 @@

  * SOFTWARE.

  */

 

+#include <linux/highmem.h>

#include "rxe.h"

#include "rxe_loc.h"

 

@@ -94,7 +95,15 @@ static void rxe_mem_init(int access, struct rxe_mem *mem)

void rxe_mem_cleanup(struct rxe_pool_entry *arg)

{

        struct rxe_mem *mem = container_of(arg, typeof(*mem), pelem);

-       int i;

+       int i, entry;

+       struct scatterlist *sg;

+

+       if (mem->kmap_occurred) {

+               for_each_sg(mem->umem->sg_head.sgl, sg,

+                           mem->umem->nmap, entry) {

+                       kunmap(sg_page(sg));

+               }

+       }

 

        if (mem->umem)

                ib_umem_release(mem->umem);

@@ -200,12 +209,14 @@ int rxe_mem_init_user(struct rxe_dev *rxe, struct rxe_pd *pd, u64 start,

                buf = map[0]->buf;

 

                for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) {

-                       vaddr = page_address(sg_page(sg));

+                       // vaddr = page_address(sg_page(sg));

+                       vaddr = kmap(sg_page(sg));

                        if (!vaddr) {

                                pr_warn("null vaddr\n");

                                err = -ENOMEM;

                                goto err1;

                        }

+                       mem->kmap_occurred = 1;

 

                        buf->addr = (uintptr_t)vaddr;

                        buf->size = BIT(umem->page_shift);

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h

index af1470d..9bd7eac 100644

--- a/drivers/infiniband/sw/rxe/rxe_verbs.h

+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h

@@ -343,6 +343,8 @@ struct rxe_mem {

        u32                     num_map;

 

        struct rxe_map          **map;

+

+       int                     kmap_occurred;

};

 

struct rxe_mc_grp {

--

2.7.4

 

The idea is that you need to use kmap()/kunmap() rather than page_address() to handle these memory regions that are being used by both the kernel and user memory to make this work on the ARM...

 

Thanks,

FM

Building kernel module with ib client (un)register functions

$
0
0

Written minimal code of a kernel module registering RDMA-client using two functions ib_register_client(), ib_unregister_client(). The compiled code with the source code can be downloaded from the repository: https://github.com/sSadin/rdma_core_init.git

The compilation is successful. However, the module isn't loading, it generates an error in the system log:

... rdma_init: disagrees about version of symbol ib_unregister_client

... rdma_init: Unknown symbol ib_unregister_client (err -22)

... rdma_init: disagrees about version of symbol ib_register_client

... rdma_init: Unknown symbol ib_register_client (err -22)

----------------------------------

 

Installed OS: Ubunto 16.04

@uname -r

4.4.114

 

Installed Mellanox software: MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu16.04-x86_64.tgz

with command:

@./mlnxofedinstall --add-kernel-support

 

After install, there is new catalogs:

/usr/src/mlnx-ofed-kernel-4.4/include

/usr/src/ofa_kernel/default/include

with includes. But in /usr/src/linux-headers-4.4.0-116/include have "old" versions of files.

----------------------------------

@modinfo rdma_core_init.ko

srcversion:     21C176F120C52D1ED6D19F1

depends:        ib_core

vermagic:       4.4.114

----------------------------------

@modinfo ib_core

filename:       /lib/modules/4.4.114/updates/dkms/ib_core.ko

description:    core kernel InfiniBand API

srcversion:     A1112DAE0CC4C253540C773

depends:        mlx_compat

vermagic:       4.4.114

 

Note: if open the generated file rdma_init.mod.ko:

  { 0x51b43427, __VMLINUX_SYMBOL_STR(ib_register_client) },

and i open file ib_core.ko from path: /lib/modules/4.4.114/build/drivers/infiniband/core

CRC for this function is the same:

0000000051b43427 A __crc_ib_register_client

But command [modinfo ib_core] points to path: /lib/modules/4.4.114/updates/dkms, and CRC for this function is:

00000000b184c3d5 A __crc_ib_register_client

 

Q: what should I do to compile and load the module correctly?

 

Re: How to enable VF multi-queue for SR-IOV on KVM?

$
0
0

Where to open a technical support?

Re: "Priority trust-mode is not supported on your system"?

$
0
0

Hi,

 

What is your current system ? Distribution / Kernel

ConnectX-3 Pro

FW version ?

PSID ?

Can you try with the latest Mellanox OFED 4.4 ?

 

Maybe the p4p1 is not the mellanox interface ?

Maybe it is not configured as ethernet interface ?

 

Please check

 

Marc


why not just BUG_ON(!pci_channel_offline(dev->persist->pdev))

$
0
0

diff --git a/drivers/net/ethernet/mellanox/mlx4/catas.c b/drivers/net/ethernet/mellanox/mlx4/catas.c

index 715de8a..e866082 100644

--- a/drivers/net/ethernet/mellanox/mlx4/catas.c

+++ b/drivers/net/ethernet/mellanox/mlx4/catas.c

@@ -182,10 +182,17 @@ void mlx4_enter_error_state(struct mlx4_dev_persistent *persist)

       err = mlx4_reset_slave(dev);

  else

       err = mlx4_reset_master(dev);

- BUG_ON(err != 0);

+

+ if (!err)

+      mlx4_err(dev, "device was reset successfully\n");

+ else

+      /* EEH could have disabled the PCI channel during reset. That's

+      * recoverable and the PCI error flow will handle it.

+      */

+      if (!pci_channel_offline(dev->persist->pdev))

+           BUG_ON(1);

 

  dev->persist->state |= MLX4_DEVICE_STATE_INTERNAL_ERROR;

- mlx4_err(dev, "device was reset successfully\n");

  mutex_unlock(&persist->device_state_mutex);

Re: "Priority trust-mode is not supported on your system"?

$
0
0

Hi, Marc.

The system information is as following:

深度截图_选择区域_20180731212715.png

The RNIC information is as following:

深度截图_选择区域_20180731220447.png

And I upgrade the OFED to 4.4, the result is the same.

I have checked the interface parameter and it's right.

Thanks.

Re: "Priority trust-mode is not supported on your system"?

$
0
0

Hi

Can you show me ibdev2netdev output

Can  u also try

mlnx_qos -i <interface>

Thanks

Marc

Re: "Priority trust-mode is not supported on your system"?

$
0
0

Hi, Marc

Ah...

Sorry, after upgrading OFED to 4.4, the trust mode can be set on RNIC.

But there is another error message: "Buffers commands are not supported on your system".

DeepinScreenshot_select-area_20180731230506.png

Thanks.

Re: "Priority trust-mode is not supported on your system"?

$
0
0

Hi,

 

Can you make the try to modify the buffer size and send me the output.

ibdev2devnet also , please.

 

Marc

Re: "Priority trust-mode is not supported on your system"?

$
0
0

Hi, Marc

The result is as following:

DeepinScreenshot_select-area_20180731232703.png

It seems that the buffer commands is a new feature for PFC in OFED 4.4.

I checked the OFED 4.3 and didn't see this option for mlnx_qos.

Thanks.

Re: "Priority trust-mode is not supported on your system"?

$
0
0

Hi,

 

After a first check on my card ConnectX-3, I got the same behavior

 

It seems to be supported only from ConnectX-4 and above.

 

If you want me to investigate it more, please open a ticket.

 

# mlnx_qos -i ens6

Buffers commands are not supported on your system

 

 

Marc


Re: rx_fifo_errors and rx_dropped errors using VMA where CPU user less than 40%

$
0
0

If you are seeing the same behaviour without VMA, why to complicate the problem? Start tuning the system and see if it helps. Adding more components will not help to troubleshoot. After tuning, I would suggest to check netstat -s/nstat and 'netstat -unp' to check the receive queue size.

The tuning guides are available from Mellanox site - Performance Tuning for Mellanox Adapters

You also might check what is the current number of send/receive queues configured on interface and try to limit it to 16

ethtool -L <IFS> rx 16 tx 16

Re: rx_fifo_errors and rx_dropped errors using VMA where CPU user less than 40%

$
0
0

Hi Alkx,

 

Thanks for your reply. I've done all the performance tuning steps from the site you recommend. I tried VMA because I was expecting someone would say "Have you tried VMA?", also vma_stats seems to give more visibility of the various buffer sizes (and errors) than available via the kernel.

 

I monitor /proc/net/udp. With VMA off, it shows no drops and rarely more than a few MB in the UDP buffer (I think this equivalent to netstat -unp).

 

Thanks for the tip on ethtool -L. Below are my current settings. I'll have a play with it and see if things improve. I hadn't seen that before. I wonder why it isn't in the tuning guides?

 

Also:

- What's the difference between the 'rings' (ethtool -g) and 'channels' (ethtool -L)?

- Why does making the channels smaller help?

 

ban115@tethys:~$ /sbin/ethtool -g enp132s0

Ring parameters for enp132s0:

Pre-set maximums:

RX:        8192

RX Mini:    0

RX Jumbo:    0

TX:        8192

Current hardware settings:

RX:        8192

RX Mini:    0

RX Jumbo:    0

TX:        512

 

ban115@tethys:~$ /sbin/ethtool -L enp132s0

no channel parameters changed, aborting

current values: tx 8 rx 32 other 0 combined 0

Re: MLNX+NVIDIA ASYNC GPUDirect - Segmentation fault: invalid permissions for mapped object running mpi with CUDA

$
0
0

Hi Jainkun yang,

Sorry for very late reply.

I am getting 7 micro seconds latency for the starting Bytes.

 

When i run osu_bw test, i am seeing that System memory is also getting used along with GPU Memory. These seems strange right. With GPUDirect RDMA, we should not see any system memory usage right? Am i missing something?

lspcu -tv output is for both the systems

+-[0000:80]-+-00.0-[81]--

|           +-01.0-[82]--

|           +-01.1-[83]--

|           +-02.0-[84]--

|           +-02.2-[85]----00.0  Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]

|           +-03.0-[86]----00.0  NVIDIA Corporation Device 15f8

 

 

On Host Systems:

80:02.2 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 2 (rev 02) (prog-if 00 [Normal decode])

80:03.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 (rev 02) (prog-if 00 [Normal decode])

 

On Peer System:

80:02.2 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 2 (rev 01) (prog-if 00 [Normal decode])

80:03.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 (rev 01) (prog-if 00 [Normal decode])

 

Host CPU:

# lscpu

Architecture:          x86_64

CPU op-mode(s):        32-bit, 64-bit

Byte Order:            Little Endian

CPU(s):                72

On-line CPU(s) list:   0-71

Thread(s) per core:    2

Core(s) per socket:    18

Socket(s):             2

NUMA node(s):          1

Vendor ID:             GenuineIntel

CPU family:            6

Model:                 63

Model name:            Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

Stepping:              2

CPU MHz:               1202.199

CPU max MHz:           3600.0000

CPU min MHz:           1200.0000

BogoMIPS:              4590.86

Virtualization:        VT-x

L1d cache:             32K

L1i cache:             32K

L2 cache:              256K

L3 cache:              46080K

NUMA node0 CPU(s):     0-71

Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb invpcid_single retpoline kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts

 

Peer CPU:

 

# lscpu

Architecture:          x86_64

CPU op-mode(s):        32-bit, 64-bit

Byte Order:            Little Endian

CPU(s):                32

On-line CPU(s) list:   0-31

Thread(s) per core:    2

Core(s) per socket:    8

Socket(s):             2

NUMA node(s):          1

Vendor ID:             GenuineIntel

CPU family:            6

Model:                 79

Model name:            Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz

Stepping:              1

CPU MHz:               1201.019

CPU max MHz:           3000.0000

CPU min MHz:           1200.0000

BogoMIPS:              4191.23

Virtualization:        VT-x

L1d cache:             32K

L1i cache:             32K

L2 cache:              256K

L3 cache:              20480K

NUMA node0 CPU(s):     0-31

Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt retpoline kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts

RoCEv2 PFC/ECN Issues

$
0
0

We have two servers with ConnectX-4 100Ge cards and two Cisco C3232C switches with routing between them and are trying to get RoCEv2 routing through with PFC/ECN to provide the best performance during periods of congestion.

 

The funny thing is using base configuration with no other servers on the switches, we get terrible performance (1.6 Gbps) across the routed link using iSER when we are only using about 20 Gbps (1 iSER connection and test workload configuration). By using multiple iSER connections and PFC, we can get about 95 Gbps, so we know that the hardware is capable of the performance in routing mode. We can't understand why in the default case the performance is so bad. The fio test shows that a lot of IO happens, then there is none and it just cycles back and forth.

 

We would like to use both PFC and ECN for our configuration, but we are trying to validate that ECN will work without PFC and when we disable PFC, we can't test ECN most likely because of the above issue.

 

On the Cisco switches, we have policy maps that places our traffic with the DSCP markings into a group that has ECN enabled (I'm not a Cisco person, so I may not be getting the terminology quite right) and we can see the group counters on the Cisco incrementing. We don't ever see any packets marked with congestion, probably because the switch never sees any due to the above problem.

 

When we have the client set to 40 Gbps and do a read test with PFC, we get pause frames and great performance. We have the Cisco switches match the DSCP value and remark the COS for packets that traverse the router (interesting enough Cisco sends PFC pause frames on the routed link even though there are no VLANs configured. We captured it in wireshark, but with the adapters set to --trust=pcp, the performance in terrible, but --trust=dscp works well). The Cisco switches also show pause frame counters incrementing when we are 100g end to end. I'm not sure why it would be incrementing when there is no congestion.

 

We have done so many permutations of tests, that I may be getting fuzzy in some details. Here is a matrix of some tests that I can be sure of. This is all 100g end to end.

 

switch PFC mode (ports)trust modepfc prio 3 enabledskprio -> cos mappingResult
static on/offmlnx_qos --trust=Xmlnx_qos --pfc=0,0,0,X,0,0,0,0ip link set rsY.Z type vlan egress 2:3
onpcpyesyesGood
onpcpyesnoGood
onpcpnoyesBad
onpcpnonoBad
ondscpyesyesGood
ondscpyesnoGood
ondscpnoyesBad
ondscpnonoBad
offpcpyesyesBad
offpcpyesnoBad
offpcpnoyesBad
offpcpnonoBad
offdscpyesyesBad
offdscpyesnoBad
offdscpnoyesBad
offdscpnonoBad

 

We are using OFED 4.4-1.0.0.0 on both nodes, one is CentOS 7.3, the other CentOS 7.4, running 4.9.116 and the firmware is 12.23.1000 on one card and 12.23.1020 on the other. In addition to the above matrix, we have only changed:

 

echo 26 > /sys/class/net/rs8bp2/ecn/roce_np/cnp_dscp

echo 106 > /sys/kernel/config/rdma_cm/mlx5_3/ports/1/default_roce_tos

 

If you have any ideas that we can try, we would appreciate it.

 

Thank you.

Re: "Priority trust-mode is not supported on your system"?

Viewing all 6227 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>