Running RoCE over L2 Network Enabled with PFC via Mellanox

Goals

  • Follow HowTo Run RoCE and TCP over L2 Enabled with PFC (2016)
  • 2 Hosts use 1 network (VLAN 100) for all traffic
  • Each host runs two traffic flows as below:
    • RoCE Flow (bypass the kernel)
      • Priority 4 is enabled and used for the RoCE application only.
    • TCP Flow (pass via the kernel)
      • TCP will be sent over priority 0

Environment

  • mlx-1.on.ec
    • Ubuntu 14.04.3
    • Mellanox ConnectX-3 Adapter 10GbE
      • Make sure you have the latest version of MLNX-OFED installed, instead of MLNX_EN
    • VLAN 100 IP: 172.16.100.1/24
  • mlx-2.on.ec
    • Ubuntu 14.04.3
    • Mellanox ConnectX-4 Adapter 10GbE
      • Make sure you have the latest version of MLNX-OFED installed, instead of MLNX_EN
    • VLAN 100 IP: 172.16.100.2/24
  • Switch
    • BigSwitch BCF / Cumulus Linux / PicOS

Install OFED Driver

1
2
3
4
5
6
apt install -y gfortran make flex swig tk8.4 python-libxml2 libnl1 tcl8.4 autoconf dkms bison dpatch chrpath gcc libgfortran3 graphviz tk automake pkg-config autotools-dev quilt m4 tcl debhelper libltdl-dev
tar zxvf MLNX_OFED_LINUX-4.0-2.0.0.1-ubuntu14.04-x86_64.tgz
cd ./MLNX_OFED_LINUX-4.0-2.0.0.1-ubuntu14.04-x86_64/
./mlnxofedinstall
/etc/init.d/openibd restart
update-rc.d openibd defaults

Enable PFC on Priority 4

1
2
3
echo "options mlx4_en pfctx=0x10 pfcrx=0x10" >> /etc/modprobe.d/mlx4_en.conf
/etc/init.d/openibd restart
RX=`cat /sys/module/mlx4_en/parameters/pfcrx`;printf "0x%x\n" $RX
  • 0x10 = 00010000, which means that only priority 4 is enabled on that host

Create VLAN 100 Interface and IP

1
2
3
4
5
6
echo "8021q" >> /etc/modules
modprobe 8021q
apt install -y vlan
vconfig add p3p1 100
ifconfig p3p1 up
ifconfig p3p1.100 172.16.100.1/24 up

Set Egress Priority

Set Egress Priority 0 for TCP Traffic

1
for i in {0..7}; do vconfig set_egress_map p3p1.100 $i 0 ; done

Set Egress Priority 4 for RoCE Traffic

1
tc_wrap.py -i p3p1 -u 4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4
  • RoCE traffic bypasses the kernel, so vconfig commands or other kernel related commands will not work.
  • There are up to 16 skpio (kernel priorities) to be mapped to the L2 priorities (UP). In these cases we map all the priorities to L2 priority 4.

Output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
$ for i in {0..7}; do vconfig set_egress_map p3p1.100 $i 0 ; done
Set egress mapping on device -:p3p1.100:- Should be visible in /proc/net/vlan/p3p1.100
Set egress mapping on device -:p3p1.100:- Should be visible in /proc/net/vlan/p3p1.100
Set egress mapping on device -:p3p1.100:- Should be visible in /proc/net/vlan/p3p1.100
Set egress mapping on device -:p3p1.100:- Should be visible in /proc/net/vlan/p3p1.100
Set egress mapping on device -:p3p1.100:- Should be visible in /proc/net/vlan/p3p1.100
Set egress mapping on device -:p3p1.100:- Should be visible in /proc/net/vlan/p3p1.100
Set egress mapping on device -:p3p1.100:- Should be visible in /proc/net/vlan/p3p1.100
Set egress mapping on device -:p3p1.100:- Should be visible in /proc/net/vlan/p3p1.100
$ tc_wrap.py -i p3p1
# This section is due to the vconfig set_egress_map (kernel flow).
UP 0
skprio: 0 (vlan 100)
skprio: 1 (vlan 100)
skprio: 2 (vlan 100 tos: 8)
skprio: 3 (vlan 100)
skprio: 4 (vlan 100 tos: 24)
skprio: 5 (vlan 100)
skprio: 6 (vlan 100 tos: 16)
skprio: 7 (vlan 100)
UP 1
UP 2
UP 3
UP 4
UP 5
UP 6
UP 7
$ tc_wrap.py -i p3p1 -u 4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4
# This section is due to the tc_wrap script (kernel bypass flow) with RoCE
skprio2up is available only for RoCE in kernels that don't support set_egress_map
Traffic classes are set to 8
UP 0
skprio: 0 (vlan 100)
skprio: 1 (vlan 100)
skprio: 2 (vlan 100 tos: 8)
skprio: 3 (vlan 100)
skprio: 4 (vlan 100 tos: 24)
skprio: 5 (vlan 100)
skprio: 6 (vlan 100 tos: 16)
UP 1
UP 2
UP 3
UP 4
skprio: 0
skprio: 1
skprio: 2 (tos: 8)
skprio: 3
skprio: 4 (tos: 24)
skprio: 5
skprio: 6 (tos: 16)
skprio: 7
skprio: 8
skprio: 9
skprio: 10
skprio: 11
skprio: 12
skprio: 13
skprio: 14
skprio: 15
UP 5
UP 6
UP 7

References