New iSER Driver installation on ESXi 6.5-U1

September 14, 2017, 1:25 am

≫ Next: Re: ConnectX-3 WinOF 5.35 on Win2016 Multiple Partitions

≪ Previous: Re: RDS-TOOLS PACKAGE ON MLNX_OFED_LINUX-4.1-1.0.2.0-fc24-x86_64.iso ?

Hello,

We have tried to install the new iSER driver (MLNX-NATIVE-ESX-ISER_1.0.0.1-10EM-650.0.0.4598673.zip) on ESXi 6.5 U1.

But only the software iSCSI shows up in the storage adapter list. The iSER adapter is not visible.

The configuration is as follows:

Hardware:

- Servers: PowerEdge R720, Cisco C-240 M3 and Lenovo RD650 128 to 192GB RAM.
- Dual Xeon
- Mellanox ConnectX-3 MCX354A-FCBT adapters

Software:

- ESXi 6.5-U1 Build 5969303
- Mellanox Drivers native drivers 3.16.0.0-1vmw.650.0.0.4564106

The installation of the new iSER drivers is successful and we have issued the following command to ensure that the driver is loaded on boot:

# esxcli system module set --enabled=true --module=iser

All the hosts boot normally and

# esxcfg-module -g iser

Gives:

iser enabled = 1 options = ''

However just to check, after booting if we issue the following command the first time:

# vmkload_mod iser

We get:

Module iser loaded successfully

Which leads to believe that the module was not loaded automatically

Launching the same command a second time leads to:

vmkload_mod: Can not load module iser: module is already loaded

On all 5 hosts we are not able to see the iSER adapter even after several reboots.

The network adapters are connected to a vDS:

Any clue or procedure to clean install everything in order to bring up the iSER adapter ?

Thanks !

↧

Re: ConnectX-3 WinOF 5.35 on Win2016 Multiple Partitions

September 14, 2017, 5:38 am

≫ Next: Re: How to test RDMA traffic congestion

≪ Previous: New iSER Driver installation on ESXi 6.5-U1

Two nodes are both Windows 2016. SM has partition configured as mtu=5. Windows driver side is also configured with 4096. In fact, on the later versions of the Windows driver, it will throw out warning message in the event log if partition MTU does not match the driver MTU. Through my testing, I know for sure FW 2.36.5150 works perfect with driver 5.25.12665. FW 2.40.7000 does not work with driver 5.25 nor 5.35. I have yet to see if FW 2.36.5150 works with driver 5.35. The server is an older unit based on the Intel Tylersburg chipset.

↧

Re: How to test RDMA traffic congestion

September 14, 2017, 6:57 am

≫ Next: Re: IB Switch IS5035 MTU Setting?

≪ Previous: Re: ConnectX-3 WinOF 5.35 on Win2016 Multiple Partitions

Here few steps to try to analyse your congestion problem:

What is IB congestion?

IB congestion is a situation where nodes fail to send data or send rate decreases
In most cases when an IB network is experiencing congestion, there will be no packets drops. Just slowness
Usually IB congestion is caused by a slow node receiver.

It can also cause by the network itself in cases where the network is blocking by design or due to an issue

How to identify congestion situation:

Network is slow. All or some of the nodes packet rate decreases dramatically
No packet drops in the fabric. If the network drops packets it is probably not real congestion, just a physical problem that should be locally identified and fixed

Suspect #1: Physical Layer Issues

Ø Ibdiagnet diagnostic

Physical layer issues can cause degraded performance of the fabric. In order to eliminate any impact on the fabric by physical layer issues, fabric cleanup is required.

Information on fabric status and ports’ counters can be collected using the ibdiagnet tool (from the UFM server where we have the ibdiagnet2 version installed):

ibdiagnet -r -pc -P all=1 --pm_pause_time 600 -o <output_dir>

It is recommended specifying the output directory so files will not get overwritten
Output files can be used in other sections of this technical guide

In the ibdiagnet2.log file, need to look for ports reporting on one or more of the following physical layer issues:

link_down_counter – ignoring scheduled servers’ reboot

-E- lid=0x0143 dev=51000 xxxxxxxx/U1/P36

Performance Monitor counter : Value

link_down_counter : 3 (threshold=0)

Links degraded speed and width – links with reduced capability will be reported in the “Speed / Width checks” section

Speed / Width checks

-I- Link Speed Check (Compare to supported link speed)

-E- Links Speed Check finished with errors

-E- Link: S0002c902004213d3/N0002c902004213d0(Infiniscale-IV Mellanox Technologies)/P24<-->switch-1137be:IS5030/U1/P32 - Unexpected actual link speed 2.5

-I- Link Width Check (Expected value given = 4x)

-E- Links Width Check finished with errors

-E- Link: S0002c902004213d3/N0002c902004213d0(Infiniscale-IV Mellanox Technologies)/P24<-->switch-1137be:IS5030/U1/P32 - Unexpected width, actual link width is 1x

link_error_recovery_counter

-E- lid=0x0009 dev=51000 xxx/U1/P32

Performance Monitor counter : Value

link_error_recovery_counter : 255 (overflow)

max_retransmission_rate– check for increments during test run. Look for anything greater than threshold of 500 (the threshold mentioned in the example below is set by the ibdiagnet test flag “-P all=1”)

-E- Ports counters Difference Check (during run) finished with errors

-E- Sf4521403004d20a0/r xxx/P6 - "max_retransmission_rate" increased during the run (difference value=1,difference allowed threshold=1)

symbol_error_counter– relevant only for non FDR/FDR10 links

-E- lid=0x016e dev=23131 S0008f1040040c018/N0008f10500650e4e/P30

Performance Monitor counter : Value

symbol_error_counter : 65535 (overflow)

Ø UFM Port Counters CSV diagnostic

Configuring UFM to collect PortCounters CSV files in gv.cfg configuration file:

[CSV]

max_files= 5

write_interval= 30

ext_ports_only= no

Output files will be saved in this location on the UFM server: /opt/ufm/files/csv/.

Extract the latest file and open with Excel
Form a table
Relevant column for physical layer issues:
1. E: Width – look for any port without 4x width
2. T: SymErr – SymbolError. Relevant for non FDR/FDR10 links
3. U: LinkRecovers
4. V: LinkDowned
5. AY: Speed – look for any degraded rate
6. AZ: Status – look for anything not OK

Device name and port can be found in columns P and B respectively.

Suspect #2: Unresponsive node/s issue

Looking for unresponsive nodes to fabric MADs. Nodes can get to this situation if there is any issue with OS, driver or card firmware. Once identified, it is recommended that the unresponsive nodes will not participate in any job in the fabric.

If there are any unresponsive nodes in the fabric, we can find them by invoking one of the direct path commands such as iblinkinfo, ibnetdiscover, ibswitches, ibhosts, ibnodes, ethc.

Run one of the direct path commands: iblinkinfo/ibnetdiscover/ibswitches/ibhosts/ibnodes
If there are unresponsive nodes in the fabric, you will get 1 “Connection times out” line per unresponsive node at the start of the command output, with specific direct path to the node

Example:

root # ibnetdiscover

src/query_smp.c:197; umad (DR path slid 0; dlid 0; 0,1,18 Attr 0xff90:2) bad status 110; Connection timed out

src/query_smp.c:197; umad (DR path slid 0; dlid 0; 0,1,17 Attr 0xff90:2) bad status 110; Connection timed out

# Topology file: generated on Mon Mar 2 17:19:19 2016

# Initiated from node f4521403008b9a30 port f4521403008b9a31

Identify the unresponsive node/s:
1. From the same node where the direct path command invoked, run:

smpquery nd -D <direct_path_without_last_number>

Example: for direct path "0,1,18" invoke: "smpquery nd -D 0,1"

The unresponsive device is connected to the device outputted in last step by port number as the last number in the direct path

Example: for direct path "0,1,18", the unresponsive device will be connected to port 18

Suspect #3: Slow Receivers

Nodes that pushes back on data because it can’t process data fast enough
A slow node will not give the switch credits to send traffic. The backpressure will spread on to other connected switches by allocating buffer space for delayed traffic

Congested links:

Indication for a congested link is a link that sends or receive high amount of data (high XmitPacket/RcvPacket) and is also having high rates of XmitWait
We can get a clear indication for congestion if: WmitWait / XmitPackets >10

(Ratio between XmitWait and the XmitPacket is bigger than 10)

Possible causes for slow receiver:

Server resources
- CPU speed – it is recommended to work with CPU in max performance mode
- Memory - bad memory dimm or memory section can decrease the server performance. This can only be detected with low-level memory testing utilities
PCI connection – degraded Gen (speed) and/or width

More information can be found in the Performance Tuning Guide document.

Ø Detecting slow receivers using PortCounters CSV file

For using this method, the reset counters policy should be reset_every_poll (only data counters will be reset).

Extract 2x latest CSV files (by name convention)
Open the 2 files in Excel and format as tables
Copy the XmitWait column from the older file to the new file right next to the XmitWait column in the newer file
Insert new column (NEW_ XmitWait) and calculate the delta between the 2 XmitWait values (we want the number of ticks counted between the 2 files)
In column D (NodeType) select only Switch
In Column AR (PeerPlatform) select only Computer
Insert new column, Congestion Ratio, and add formula of: NEW_ XmitWait/XmitPkts
Sort Congestion Ratio column from largest to smallest
Start from the top on any transmitting port reporting on a ratio greater than 10

Ø Detecting slow receivers using ibdiagnet2

With this method, manual mapping between GUIDs and hostname is required.

This can be done using the Excel vlookup function and any parsed hostname <-> GUIDs list.

Copy the “PM_INFO” data from the f ibdiagnet2.db_csv file to Excel sheet and for a table

Example – all other columns are hidden:

Calculate the Congestion index = XmitWait / XmitPkt

Using 32/64 bits counters. 64 bit Counters requires additional translation from Hex to Dec

Example:

Complete data & Analyze results

Congestion index: Normalized XmitWait [ticks] = ∆XmitWait / ∆XmitPackets

Avg # of ticks packet waits in Head of Queue

Ports with Congestion index >= 10 should be treated as congested

Example:

Suspect #4: Network issues

Ø Routing issue

Routing issues can be investigated by Mellanox support using the following information:

ibdiagnet output files
Opensm log
Opensm configuration files (/opt/ufm/files/conf/opensm/)
ibnetdiscover
partitions.conf
/opt/ufm/files/log/ opensm-sa.dump
Root GUIDs file

Ø Topology change

Using MSTK:

Missing links or devices can cause degradation in performance.

You can use the /opt/ufm/support/MSTK5.5/Linux/Host-Tools/ib-topology-viewer.sh script on the UFM server for backing up reference topology summary and comparing to any new collected topology summary.

[root@xxx Host-Tools]# ./ib-topology-viewer.sh

ib-topology-viewer.sh Version 5.5

MF0;xxx:SX6036/U1(0x0002c903004693c1) 1 HCA ports and 2 switch ports.

SwitchIB Mellanox Technologies(0x7cfe9003009ea930) 2 HCA ports and 3 switch ports.

SwitchIB Mellanox Technologies(0x7cfe900300bf8530) 1 HCA ports and 1 switch ports.

Using ibnetdiscover:

Cache ibnetdiscover data – this will be the reference data:

ibnetdicover --cache <file>

Compare any new ibnetdiscover to the cached data:

ibnetdiscover --diff <cache_file>

Output will contain changed between cached data and new ibnetdiscover output.

Marc

↧

Re: IB Switch IS5035 MTU Setting?

September 14, 2017, 7:27 am

≫ Next: Re: New iSER Driver installation on ESXi 6.5-U1

≪ Previous: Re: How to test RDMA traffic congestion

All Mellanox embedded SM's MTU is 2k(2048-4=2044).

Therefore you should change default partition's MTU to 4k(4096-4092).

Best Regard,

Jae-Hoon Choi

↧

Re: New iSER Driver installation on ESXi 6.5-U1

September 14, 2017, 7:01 am

≫ Next: Re: Question about ESXi 6.5 iSER driver with PFC port configuraion.

≪ Previous: Re: IB Switch IS5035 MTU Setting?

You must add iSER adapter on ESXi console, not a software iSCSI adapter.

If you execute esxcli rdma iser add command then you will see a iSER adapter.

But if you reboot ESXi host iSER adapter will disappear...

Therefore you should be add a esxcli rdma iser add to /etc/rc.local.d/local.sh

I'm also test with my SCST iSER Target, but fail to connect to iSER Target

My Friend also test with LIO Target, StarWind iSER Target, but everytime failed to connect to iSER Target.

My friend use iSER with 1.9.10.5 for ESXi 6.0 driver - that support only ESXi 6.0 - on ESXi 6.0 host successfully.

I think this iSER driver 1.0.0.1 for ESXi 6.5 is a beta level addon module...:(

Best Regard,

Jae-Hoon Choi

↧

Re: Question about ESXi 6.5 iSER driver with PFC port configuraion.

September 14, 2017, 7:55 am

≫ Next: Re: How to test RDMA traffic congestion

≪ Previous: Re: New iSER Driver installation on ESXi 6.5-U1

here are some fine articles that Mellanox published on how to configure PFC on Mellanox switches

For switchX

HowTo Enable PFC on Mellanox Switches (SwitchX)

For Spectrum switch

How to Enable PFC on Mellanox Switches (Spectrum)

As for the other issues you raised (iser storage adapter disappeared after every ESXi 6.5 host reboot + iSER initiator failing to connect SCST ) - this I believe requires a whole session of troubleshooting logs & dumps so suggesting that you approach support@mellanox.com, describe the problems in details and get advised by relevant Mellanox support engineers

↧

Re: How to test RDMA traffic congestion

September 14, 2017, 5:19 pm

≫ Next: Question about ESXi 6.5 iSER PFC direct connection

≪ Previous: Re: Question about ESXi 6.5 iSER driver with PFC port configuraion.

Hi Marc,

Thank you for the comprehensive answer, however please note that this is a RoCE(v1) fabric, i.e., there is no IB link-layer, so almost all of the troubleshooting tips you provided do not apply (directly). I would dearly love to see the same sort of guide for RoCE.

Cheers,

Blair

↧

Question about ESXi 6.5 iSER PFC direct connection

September 14, 2017, 6:18 pm

≫ Next: Re: RoCE not working on Win 2016 (ConnectX-3 Pro)

≪ Previous: Re: How to test RDMA traffic congestion

I have connectx-4 cards currently directly connected to each other. My one host is running a scst storage target and the other is running esxi 6.5 u1. I see the iser driver has been released and i noted the pfc config mentioned. Since there is no switch between the cards, they are direct connect, is pfc needed? If so is there anything i could configure on the nics themselves to meet this requirement?

↧

Re: RoCE not working on Win 2016 (ConnectX-3 Pro)

September 14, 2017, 10:35 pm

≫ Next: Re: New iSER Driver installation on ESXi 6.5-U1

≪ Previous: Question about ESXi 6.5 iSER PFC direct connection

Firmware version 2.42.500 is now out.

I can't see any notes regarding this issue in the release notes, either in the fixes or known bugs....

Anyone been able to test?

↧

Re: New iSER Driver installation on ESXi 6.5-U1

September 15, 2017, 1:21 am

≫ Next: solution for design small HPC

≪ Previous: Re: RoCE not working on Win 2016 (ConnectX-3 Pro)

Thanks Jae-Hoon !

I can now see the iSER adapters.

Will try to hook the host to some iSER targets later and post the results here.

I hope it can work with the switch ports configured with PFC instead of Global Pause.

↧

solution for design small HPC

September 15, 2017, 2:39 am

≫ Next: Re: Issues with setting up Storage Spaces Direct

≪ Previous: Re: New iSER Driver installation on ESXi 6.5-U1

we have 35 HP G9 and 40 HP G8 server. Help us for solution to create one or 2 HPCs.

↧

Re: Issues with setting up Storage Spaces Direct

September 15, 2017, 7:44 am

≫ Next: Re: New iSER Driver installation on ESXi 6.5-U1

≪ Previous: solution for design small HPC

Before the tests that involve I/O system, did you verify that

* TCP/IP connectivity works ( ping?)

* nd_write_bw/nd_read_bw tests are working?

* are you able to run nd_XXXX tests on the same machine? Use two shell windows to run sever in one and client in another?

* what is the failure you are getting when RDMA test fails?

↧

Re: New iSER Driver installation on ESXi 6.5-U1

September 15, 2017, 9:28 am

≫ Next: Re: Question about ESXi 6.5 iSER PFC direct connection

≪ Previous: Re: Issues with setting up Storage Spaces Direct

We're on the same boat.

I want to use PFC based iSER on ESXi 6.5 environment.

Here are some therads.

Question about ESXi 6.5 iSER driver with PFC port configuraion.

iSER driver for ESXi 6.5 - Does it support ConnectX-3 or not

Good luck to us who want to use iSER on ESXi 6.5.

Best Regard,

Jae-Hoon Choi

↧

Re: Question about ESXi 6.5 iSER PFC direct connection

September 15, 2017, 9:33 am

≫ Next: Re: Question about ESXi 6.5 iSER PFC direct connection

≪ Previous: Re: New iSER Driver installation on ESXi 6.5-U1

iSER protocol based on RoCE.

RoCE need a looseless network that based on PFC or GP.

Unfortunately you can't do it.

I think you should purchase a PFC based ethernet switch.

Best Regard,

Jae-Hoon Choi

↧

Re: Question about ESXi 6.5 iSER PFC direct connection

September 15, 2017, 10:36 am

≫ Next: Re: Question about ESXi 6.5 iSER PFC direct connection

≪ Previous: Re: Question about ESXi 6.5 iSER PFC direct connection

are you sure this wont work at all? since they are directly connected there is no real way for outside influences to affect the throughput. I would assume with 2 cards direct connected one cannot outsend the other etc. I did the same thing with srp without issue. I am currently doing it was iscsi as well. I will try anyways but i am holding off trying for now since everyone is reporting they cannot get iser to see their scst or lio targets anyways..

↧

Re: Question about ESXi 6.5 iSER PFC direct connection

September 16, 2017, 1:10 am

≫ Next: Re: How to test RDMA traffic congestion

≪ Previous: Re: Question about ESXi 6.5 iSER PFC direct connection

At first, I'm not a Mellanox employee.

But some information on this site show me a hintbellows.

01.ESXi 6.5 native driver

ESXi 6.5 native driver can't support vKernel driver like SRP, IPoIB and infiniband based iSER.

Therefore officially Ethernet driver supported in future.

02.Infiniband based SRP vs iSER

SRP just need SM on Target or Initiator.

Therefore you can build SRP fabtic with direct connection between HCAs.

Also Infiniband based iSER, too!

03.Ethernet based (or RoCE) iSER

RoCE must need a loseless ethernet fabric.

For example like DCB(X), ETS.

- It's a extention of ethernet.

All Converged Ethernet protocol like FCoE, RoCE relay on switched fabric only.

Ethernet iSER need a RoCE network.

That's a basic mandatory.

Best Regard,

Jae-Hoon Choi

↧

Re: How to test RDMA traffic congestion

September 16, 2017, 11:25 pm

≫ Next: Re: Question about ESXi 6.5 iSER PFC direct connection

≪ Previous: Re: Question about ESXi 6.5 iSER PFC direct connection

Hi,

For monitoring and analyse your network , I would suggest a new open tool provided by Mellanox, and is called NEO, it is a network management interface to analyse, monitor and diagnostics your ethernet network, it also works for RoCE.

Let's have a look at :

http://www.mellanox.com/page/products_dyn?product_family=278&mtag=neo_host_sw

Regards

Marc

↧

Re: Question about ESXi 6.5 iSER PFC direct connection

September 18, 2017, 7:59 am

≫ Next: Re: Question about ESXi 6.5 iSER PFC direct connection

≪ Previous: Re: How to test RDMA traffic congestion

Hmm i wonder if there is any software i can run on the linux target to emulate this functionality. Is there a true need for PFC in a nic to nic connection though?

↧

Re: Question about ESXi 6.5 iSER PFC direct connection

September 18, 2017, 9:55 am

≫ Next: Re: Question about ESXi 6.5 iSER PFC direct connection

≪ Previous: Re: Question about ESXi 6.5 iSER PFC direct connection

Here is a basic question about your environment.

If you want to use ethernet iSER you should be need PFC enabled ethernet configuration.

Then you must configure ethernet port on your Linux target with PFC.

It also need PFC configuration on ethernet switch, too!

Flow control for loseless ethernet traffic is a basic component on iSER fabric or RoCE.

Best Regard,

Jae-Hoon Choi

↧

Re: Question about ESXi 6.5 iSER PFC direct connection

September 18, 2017, 10:27 am

≫ Next: Asking about InfiniBand cards and related items for proper setup

≪ Previous: Re: Question about ESXi 6.5 iSER PFC direct connection

there is no switch in the storage path in this case, just on the data network. lldpad daemon may work, i havent started testing because i see users with normal roce setups are having problems connecting to scst, i didnt see a need to try my setup until the normal setups work =)

↧