Soft-RoCE on mininet topology

August 15, 2018, 11:45 pm

≫ Next: Re: How to configure host chaining for ConnectX-5 VPI

Hi Team,

On two mininet VMs (on virtual box), I am able to run RDMA client and server and can also send traffic using rping tool. (Using link :- HowTo Configure Soft-RoCE )

Issue -

I have created 1switch 1 host topology on either VMs and connected both switch using GRE tunnel. (Host 1 can ping Host2 and also Host2 can ping Host1).

When tried to couple veth with rxe device got the error "sh: echo: I/O error".

Can you please suggest on Soft-RoCE working for mininet topology.

Thanks

↧

Re: How to configure host chaining for ConnectX-5 VPI

August 16, 2018, 8:13 am

≫ Next: Can the cable of an AOC be replaced?

≪ Previous: Soft-RoCE on mininet topology

You're welcome!

I'm glad I helped someone after all the headache I went through for it.

I have no hard experience with VMWare, and so take all of this with a grain of salt.

First thought is vlan tags. I was told that VMWare tags by default.

From my (limited) understanding and thoughts, host chaining inside VMware is not a good idea.

If you setup a virtual switch (on the vmware side) and put both ports of the card on the switch, give that switch an IP, that would allow for vmotion and such over the link at close to line speed. Letting the switch (analogous to openvswitch) do all of the routing, and fast pathing.

Thoughts - If there was host chaining:

Vmware still sees both ports (we can't assign IPs to raw port interfaces to start with.)

It doesn't really know which port to send out, so it could take the extra hop before it gets to the destination.

Three node, desired going from A -> B might take the path of A -> C -> B

Where I can talk is non-chaining speed.

We did try using openswitch and the cards with chaining off. So long as the stp stuff is turned on; we got nearly line speed.

We opened a support ticket for our problems with MTU. It took a while, but we found the problem.

They have a nice little utility (sysinfo-snapshot) for seeing the card internals and OS config options which helped us (by looking through it.)

↧

Can the cable of an AOC be replaced?

August 16, 2018, 2:01 pm

≫ Next: mlx5: ethtool -m not working

≪ Previous: Re: How to configure host chaining for ConnectX-5 VPI

Hi all,

I've got some FDR AOCs with damaged cables. I'm hoping to reuse the transceivers instead of scrapping them. I opened op the top panel on one of the transceivers and saw that the does disconnect internally. Are there replacement cables that have those little ferules on the end, or an adapter to convert the transceiver into a standalone?

Thanks you

↧

mlx5: ethtool -m not working

August 16, 2018, 1:54 pm

≫ Next: Re: Assign a MAC to a VLAN

≪ Previous: Can the cable of an AOC be replaced?

I have a ConnectX-4 2x100G. I'm running Linux 4.16.16 (Fedora) with the mlx5_core kernel module installed. ethtool -m does not appear to work with this setup. Other ethool commands work fine such as ethtool -S and ethtool -i and just plain ethtool. I have an official Mellanox active optical cable transceiver plugged into the port. What is required to get the transceiver module info from the card? I've checked that the firmware is the latest version (MT_2150110033), this is part number MCX416A-CCAT.

$ ethtool -m enp9s0f0

Cannot get module EEPROM information: Input/output error

$ ethtool -i enp9s0f0

driver: mlx5_core

version: 5.0-0

firmware-version: 12.12.1100 (MT_2150110033)

expansion-rom-version:

bus-info: 0000:09:00.0

supports-statistics: yes

supports-test: yes

supports-eeprom-access: no

supports-register-dump: no

supports-priv-flags: yes

$ lspci | grep Mel

09:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

09:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

$ ethtool enp9s0f0

Settings for enp9s0f0:

Supported ports: [ FIBRE ]

Supported link modes: 10000baseKR/Full

40000baseCR4/Full

40000baseSR4/Full

40000baseLR4/Full

25000baseCR/Full

25000baseSR/Full

50000baseCR2/Full

100000baseSR4/Full

100000baseCR4/Full

100000baseLR4_ER4/Full

Supported pause frame use: Symmetric

Supports auto-negotiation: Yes

Supported FEC modes: Not reported

Advertised link modes: 10000baseKR/Full

40000baseCR4/Full

40000baseSR4/Full

40000baseLR4/Full

25000baseCR/Full

25000baseSR/Full

50000baseCR2/Full

100000baseSR4/Full

100000baseCR4/Full

100000baseLR4_ER4/Full

Advertised pause frame use: Symmetric

Advertised auto-negotiation: Yes

Advertised FEC modes: Not reported

Speed: 100000Mb/s

Duplex: Full

Port: FIBRE

PHYAD: 0

Transceiver: internal

Auto-negotiation: on

Supports Wake-on: d

Wake-on: d

Current message level: 0x00000004 (4)

link

Link detected: yes

↧

Re: Assign a MAC to a VLAN

August 16, 2018, 3:24 pm

≫ Next: CX5 - bad system state

≪ Previous: mlx5: ethtool -m not working

Hi.

Thank for your help!

↧

CX5 - bad system state

August 17, 2018, 6:47 am

≫ Next: Re: CX5 - bad system state

≪ Previous: Re: Assign a MAC to a VLAN

I'm working with Xilinx Petalinux on a Xilinx PG213 core as root complex, so in general, there is no confidence in the HW or SW.

CX5 gets pretty far along before it fails with:

[ 4.447417] pci 0000:01:00.0: calling mellanox_check_broken_intx_masking+0x0/0x168

[ 4.454965] mlx5_core 0000:01:00.0: runtime IRQ mapping not provided by arch

[ 4.462017] mlx5_core 0000:01:00.0: enabling device (0000 -> 0002)

[ 4.468151] mlx5_core 0000:01:00.0: enabling bus mastering

[ 4.473941] mlx5_core 0000:01:00.0: firmware version: 16.22.1002

[ 4.700002] mlx5_core 0000:01:00.0: mlx5_cmd_check:710:(pid 1710): MANAGE_PAGES(0x108) op_mod(0x1) failed, status bad system state(0x4), syndrome (0x4e2106)

[ 4.713926] mlx5_core 0000:01:00.0: give_pages:311:(pid 1710): func_id 0x0, npages 14972, err -5

[ 4.742890] mlx5_core 0000:01:00.0: failed to allocate init pages

Any clues on if this points to a HW problem? Or a SW problem?

↧

Re: CX5 - bad system state

August 17, 2018, 7:18 am

≫ Next: Keeping two versions driver for two kernels

≪ Previous: CX5 - bad system state

Found the syndrome on:

Mellanox error syndrome lists · GitHub

BAD_SYS_STATE | 0x4E2106 | manage pages: failed to read io or write host mem

↧

Keeping two versions driver for two kernels

August 19, 2018, 6:06 am

≫ Next: Slow File Transfer On 20Gbps IB

≪ Previous: Re: CX5 - bad system state

Hi,

How to set in the installation script no removal of the old driver. I have two kernels (both needed):
1. Centos 7.5 (./install --eth-only);
2. Centos 7.5 + patch RT (compilation, ./install --eth-only -add-kernel-support).

Unfortunately, one driver uninstalls another during installation. This effectively blocks the use of the latest drivers for both kernels.

Please help.

Best Regards,

Robert

↧

Slow File Transfer On 20Gbps IB

August 19, 2018, 7:07 am

≫ Next: Windows 2016 Storage Spaces Direct over IPoIB

≪ Previous: Keeping two versions driver for two kernels

Dear All,

I am new in Infiniband devices. I bought Mellanox 2 pieces of Connectx-2 (20Gbps) from ebay and installed them on 2 debian servers (PCIE3 8 lanes) with no problem. I had got 15Gbps measured with iperf3 as follow:

iperf3 -c 10.20.0.34

Connecting to host 10.20.0.34, port 5201

[ 4] local 10.20.0.35 port 58208 connected to 10.20.0.34 port 5201

[ ID] Interval Transfer Bandwidth Retr Cwnd

[ 4] 0.00-1.00 sec 1.85 GBytes 15.9 Gbits/sec 0 11.9 MBytes

[ 4] 1.00-2.00 sec 1.82 GBytes 15.6 Gbits/sec 0 11.9 MBytes

[ 4] 2.00-3.00 sec 1.82 GBytes 15.6 Gbits/sec 0 11.9 MBytes

[ 4] 3.00-4.00 sec 1.82 GBytes 15.6 Gbits/sec 0 11.9 MBytes

[ 4] 4.00-5.00 sec 1.82 GBytes 15.6 Gbits/sec 0 11.9 MBytes

[ 4] 5.00-6.00 sec 1.82 GBytes 15.6 Gbits/sec 0 11.9 MBytes

[ 4] 6.00-7.00 sec 1.82 GBytes 15.6 Gbits/sec 0 11.9 MBytes

[ 4] 7.00-8.00 sec 1.82 GBytes 15.6 Gbits/sec 0 11.9 MBytes

[ 4] 8.00-9.00 sec 1.82 GBytes 15.6 Gbits/sec 0 11.9 MBytes

[ 4] 9.00-10.00 sec 1.82 GBytes 15.6 Gbits/sec 0 11.9 MBytes

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval Transfer Bandwidth Retr

[ 4] 0.00-10.00 sec 18.2 GBytes 15.6 Gbits/sec 0 sender

[ 4] 0.00-10.00 sec 18.2 GBytes 15.6 Gbits/sec receiver

But, why do I only get 150MB/s (about 1.2Gbps) while transfer a large file (3.5GB) via SCP and RSYNC?

I think no problem with disk I/O because I transfer from and to ramdisk.

I appreciate your helps. Thank you very much.

↧

Windows 2016 Storage Spaces Direct over IPoIB

August 20, 2018, 6:33 am

≫ Next: Re: Web interface error on SX6036

≪ Previous: Slow File Transfer On 20Gbps IB

Hello,

I am in need of some assistance regarding Ethernet vs Infiniband IPoIB and lossless networks.

We have a 3 Node Windows 2016 Storage Spaces Direct Cluster that was setup early last year when documentation on S2D was still fairly sparse. We used Infiniband IPoIB instead of Ethernet because we have been using for years to connect our Hyper-V Clusters to our Windows Storage SANs. The S2D setup is Hyper-converged, storage and hypervisors are separate, so the storage data VM/Ethernet traffic are not over the same network.

We currently have a case open with Microsoft related to the Windows Server May Rollup which caused a problem with a VD after a server restart. The MS engineers have stressed that everything must be perfect in the networking creating a lossless network, including RoCE and QoS setup.

Since we are using IPoIB it has brought up the question is our configuration correct. Does Infiniband IPoIB provide the resiliency needed for S2D traffic?

Please excuse me if the question seem too simple. I have been reading on RoCE and IPoIB for a couple days and I think all the info is confusing me.

One added factor. Since S2D was new at the time and there were a variety of unknowns we included 4x 56Gb ( 2x MCX-354a-fcbt ) ports in each node. The intent being to over-spec the network to reduce the possibility of congestion.

Thanks,

Todd

↧

Re: Web interface error on SX6036

August 20, 2018, 7:18 am

≫ Next: Remote VTEP mac learning is not working

≪ Previous: Windows 2016 Storage Spaces Direct over IPoIB

Hi Andrew,

Can you provide with version of Mellanox OS running on the switch?

Thanks,

Pratik

↧

Remote VTEP mac learning is not working

August 20, 2018, 9:59 am

≫ Next: Re: Problem installing MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu18.04-x86_64

≪ Previous: Re: Web interface error on SX6036

I'm trying VXLAN configuration with above topology with Mellanox switches(with mellanox OS) as leaves and Cisco N9k as Spine. Both hosts are configured with vlan 10 tagging. Loopbaks on leaf switches are reachable via Spine. swp16 is configured as nve port and vlan 10 is bridged to VNI 10000 on both leaves. This is controller-less configuration and remote VTEPS are added manually using CLI and remote learning is enabled using below commands.

protocol nve

interface nve 1

interface nve 1 vxlan source interface loopback 1

interface ethernet 1/16 nve mode only force

interface nve 1 nve bridge 10000

interface ethernet 1/16 nve vlan 10 bridge 10000

no interface nve 1 nve fdb flood load-balance

interface nve 1 nve fdb flood bridge 10000 address 3.3.3.3

interface nve 1 nve fdb learning remote

But the hosts are not able to ping each other. What could be the problem here?

I could see that the VTEP on each switch has learnt the MAC address of the directly connected host. But unable to learn the MAC of the hosts belonging to remote VTEP. I used below command to check MAC learned.

show interface nve 1 mac-address-table

Also nve counters are increased when host2 is pinged from host1. But no packets are going out of swp2.

show interface nve 1 counters

↧

Re: Problem installing MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu18.04-x86_64

August 20, 2018, 6:18 pm

≫ Next: Re: when using write op with more than 1024B(MTU) in softroce mode，the operation fail

≪ Previous: Remote VTEP mac learning is not working

Hi Sebastian,

1) Have you validated based on the RN of the drivers that the following packages were installed:

apt-get install perl dpkg autotools-dev autoconf libtool automake1.10

automake m4 dkms debhelper tcl tcl8.4 chrpath swig

graphviz tcl-dev tcl8.4-dev tk-dev tk8.4-dev bison flex dpatch

zlib1g-dev curl libcurl4-gnutls-dev python-libxml2 libvirt-bin

libvirt0 libnl-dev libglib2.0-dev libgfortran3 automake m4

pkg-config libnuma logrotate ethtool lsof

2) Did you try to install the latest driver version 4.4-2.0.7.0.

3) Can you run it with the following options:

./mlnx_add_kernel_support.sh --make-tgz -t /var/tmp/MOFED -k `uname -r` -s /usr/src/kernels/`uname -r` -m . -n MLNX_OFED_LINUX-4.4-2.0.7.0-ubuntu18.04-x86_64-`uname -r` -v

Possibly add: --distro ubuntu18.04

Sophie.

↧

Re: when using write op with more than 1024B(MTU) in softroce mode，the operation fail

August 20, 2018, 6:33 pm

≫ Next: Re: How to configure host chaining for ConnectX-5 VPI

≪ Previous: Re: Problem installing MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu18.04-x86_64

Hi Tianyu,

Have you properly configured Soft-Roce whether upstream or Mellanox OFED Driver.

See reference links below:

HowTo Configure Soft-RoCE

How to configure Soft-RoCE with Mellanox OFED 4.x

Also, you original statement is confusing or contradicting itself:

when my write opcode with length=1024, it is ok. but when length=1025 in the same code, it will fail.

when the same code with length=1024 or 1025 run using mellanox CX4 card, it is ok >>> Apparently working.

Sophie.

↧

Re: How to configure host chaining for ConnectX-5 VPI

August 21, 2018, 11:56 pm

≫ Next: Re: How to configure host chaining for ConnectX-5 VPI

≪ Previous: Re: when using write op with more than 1024B(MTU) in softroce mode，the operation fail

Hi,

I have problem to pinging between the nic, this is my configuration:

SERVER 1: PORT1:192.168.10.10 PORT2: 192.168.10.11

SERVER 2: PORT1:192.168.10.12 PORT2: 192.168.10.13

SERVER 3: PORT1: 192.168.10.14 PORT2: 192.168.10.15

mlxconfig -d mt4119-pciconf0 set LINK_TYPE_P1=2 LINK_TYPE_P2=2

mlxconfig -d mt4119-pciconf0 set HOST_CHAINING_MODE=1

mlxfwreset --device mt4119_pciconf0 reset

All commands works perfect, but only pingin ports interconnected, i need pinging all ports.

My configuration is correct?

↧

Re: How to configure host chaining for ConnectX-5 VPI

August 22, 2018, 5:52 am

≫ Next: Re: RoCE v2 configuration with Linux drivers and packages

≪ Previous: Re: How to configure host chaining for ConnectX-5 VPI

That config looks correct. I'm being that guy... I'd be tempted to do a full machine restart.

Make sure you've issued those commands to the other servers, and done a restart to solidify the config.

I haven't used the mlxfwreset command, but looking at the docs, without the level argument, it is only doing the lowest level of what the adapter supports.

A physical 'shutdown -r now' has always worked for me.

↧

Re: RoCE v2 configuration with Linux drivers and packages

August 23, 2018, 2:15 pm

≫ Next: Factors that determine compatibility of SFPs with new fibre services?

≪ Previous: Re: How to configure host chaining for ConnectX-5 VPI

Thank you! I was able to configure and run, I had problems with i40e & i40iw drivers.

↧

Factors that determine compatibility of SFPs with new fibre services?

August 24, 2018, 12:00 am

≫ Next: MSX1012B MSX6012F

≪ Previous: Re: RoCE v2 configuration with Linux drivers and packages

Whilst I understand that product recommendations are off topic can anyone help by explaining what the critical factors are when looking for SFPs that are going to be compatible with a new service?
Is wavelength a defining factor that should be considered/matched or should anything else be used to guide selection?

Sorry I am new to 10G BASE-SR and I can't seem to find a good resource that can confirm if an SFP supported in a Cisco Nexus 5548UP will be compatible with a new service The new service is described as '10 Gigabit Ethernet LAN PHY IEEE 10G BASE-LR10.3125 Gbps +/- 100 ppm 1310nm'
Ultimately I need to understand if a 'cisco sfp-10g-sr' for which the transmitter wavelength spec is described as 850nm is usable.

Thanks for your patient, i plan to take it and any site recommend?

↧

MSX1012B MSX6012F

August 24, 2018, 3:42 am

≫ Next: Re: Problem installing MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu18.04-x86_64

≪ Previous: Factors that determine compatibility of SFPs with new fibre services?

Hi,

I have two switches that i'am very pleased with. An MSX1012B-2BFS and an MSX6012F-1BRS_WT (as shown in web console).

I have to put one switch on a site and the other on an other site : 6 kilometers between sites.

So i want to install on each switch an MC2210511-LR4 module.

The optical fiber between the sites have been tested.

My questions are:

1. Will this optical QSFP+ module (40GbE) work on each switch ? If so, on what port ? Or any port will be OK ?

2. What does mean the '' WT '' letters on the MSX6012 model ? (Wide Transceiver ??) If so, does that mean that i need to buy an another MSX6012F_WT ? Or the MSX1012B will also work fine ?

3. Actually, my MSX6012F is in VPI profil mode. Do i need to put it in single_ethernet mode ? (i don't do infiniband on the network for now, but it is envisaged).

4. As our needs are evolving, and my two switches already have almost all their ports in use, I plan to buy two MSX6036F to replace the MSX1012B and MSX6012F. Will both MC2210511-LR4 modules will work and, if so, on which ports?

Many thanks for your help !

Regards.

↧

Re: Problem installing MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu18.04-x86_64

August 25, 2018, 10:19 am

≫ Next: Re: How can I add a timestamp in Roce?

≪ Previous: MSX1012B MSX6012F

Hi Sophie,

I doublechecked the packages. Instead of tcl8.4 I have tcl8.6 installed and instead of libnuma I have libnuma1. Could this be the issue?

Version 4.4-2.0.7.0 does give me the same error messages as posted before. I can't even run mlnxofedinstall and get it to finish properly.

What can I do to provide more information so you can help me?

Thank you for your reply!

↧