Lab equipment
Dell Poweredge R710 (16 cores) + Intel 82599 10G NIC + 72G RAM
Dell Poweredge R210 (8 cores) + Intel 82599 10G NIC + 32G RAM
Network setup:
/external vlan|<------------->|eth1 <--->iperf client \ | Dell R710(ADC VE) Dell R210 | \Internal vlan|<------------->|eth2 <--->iperf server /--->------------->--->------------->
Note since I only have two physical servers, Dell R710 as host for ADC VE, I have to
use Dell R210 as both iperf server and iperf client, so I used Linux network namespace to
isolate the IP and route spaces so the iperf client packet can egress out the physical NIC eth1,
forwarded by BIGIP VE, back into physical NIC eth2 to be processed by iperf server, here is
simple bash script to setup linux network namespace:
#!/usr/bin/env bash set -x NS1="ns1" NS2="ns2" DEV1="em1" DEV2="em2" IP1="10.1.72.62" IP2="10.2.72.62" NET1="10.1.0.0/16" NET2="10.2.0.0/16" GW1="10.1.72.1" GW2="10.2.72.1" if [[ $EUID -ne 0 ]]; then echo "You must be root to run this script" exit 1 fi # Remove namespace if it exists. ip netns del $NS1 &>/dev/null ip netns del $NS2 &>/dev/null # Create namespace ip netns add $NS1 ip netns add $NS2 #add physical interface to namespace ip link set dev $DEV1 netns $NS1 ip link set dev $DEV2 netns $NS2 # Setup namespace IP . ip netns exec $NS1 ip addr add $IP1/16 dev $DEV1 ip netns exec $NS1 ip link set $DEV1 up ip netns exec $NS1 ip link set lo up ip netns exec $NS1 ip route add $NET2 via $GW1 dev $DEV1 ip netns exec $NS2 ip addr add $IP2/16 dev $DEV2 ip netns exec $NS2 ip link set $DEV2 up ip netns exec $NS2 ip link set lo up ip netns exec $NS2 ip route add $NET1 via $GW2 dev $DEV2 # Enable IP-forwarding. echo 1 > /proc/sys/net/ipv4/ip_forward# Get into namespace #ip netns exec ${NS} /bin/bash --rcfile <(echo "PS1=\"${NS}> \"")
On ADC VE I setup a simple forwarding virtual server to simply forward the packet, this is
default throughput output without any performance tuning:
ns1> /home/dpdk/iperf -c 10.2.72.62 -l 1024 -P 64 ............... ................ [ 25] 0.0-10.2 sec 46.0 MBytes 37.9 Mbits/sec [SUM] 0.0-10.2 sec 3.22 GBytes 2.72 Gbits/sec <======= 2.72Gbits
here is the top output of vhost dataplane kernel thread for the ADC VE look like while passing traffic:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND P 23329 libvirt+ 20 0 35.366g 0.030t 23396 S 262.5 43.4 153:31.10 qemu-system-x86_64 -enable-kvm -name bigip-virtio -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 31357 -realtime m+ 1 23332 root 20 0 0 0 0 R 17.9 0.0 1:35.98 [vhost-23329] 1 23336 root 20 0 0 0 0 R 17.9 0.0 1:18.20 [vhost-23329]
as you can see there are two vhost kernel thread showing up with 17.9% CPU usage, which indicates
vhost is not fully scheduled to pass data traffic for the guest machine. I have defined 4 tx/rx queues pair
for the macvtap on the physical 10G interface and two macvtap assigned to the ADC VE for external and internal vlan
, ideally, there should be 8 vhost kernel threads showing up from top that is fully scheduled to pass traffic
for example the interface xml dump as below:
vCPU pin assigned<interface type='bridge'> <mac address='52:54:00:55:47:05'/> <source bridge='br0'/> <target dev='vnet1'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <interface type='direct'> <mac address='52:54:00:f9:98:e9'/> <source dev='enp4s0f0' mode='vepa'/> <target dev='macvtap2'/> <model type='virtio'/> <driver name='vhost' queues='4'/> <alias name='net1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </interface> <interface type='direct'> <mac address='52:54:00:4b:06:c4'/> <source dev='enp4s0f1' mode='vepa'/> <target dev='macvtap3'/> <model type='virtio'/> <driver name='vhost' queues='4'/> <alias name='net2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </interface>
root@Dell710:~# virsh vcpupin bigip-virtio VCPU: CPU Affinity ---------------------------------- 0: 0 1: 2 2: 4 3: 6 4: 8 5: 10 6: 12 7: 14 8: 2 9: 4 vhost cpu pin: ~# virsh emulatorpin bigip-virtio emulator: CPU Affinity ---------------------------------- *: 0,2,4,6,8,10,12,14 NUMA node:# lscpu --parse=node,core,cpu # The following is the parsable format, which can be fed to other # programs. Each different item in every column has an unique ID # starting from zero. # Node,Core,CPU 0,0,0 1,1,1 0,2,2 1,3,3 0,4,4 1,5,5 0,6,6 1,7,7 0,0,8 1,1,9 0,2,10 1,3,11 0,4,12 1,5,13 0,6,14 1,7,15so the odd CPU is on NUMA node 1, even CPU is on NUMA node 0, guest is pined to NUMA node 0 and vhost is pined to NUMA node 0 toowhich should be good. why the lower throughput.lets try assign the vhost to NUMA node 1 CPU:# virsh emulatorpin bigip-virtio 1,3,5,7,9,11,13,15# virsh emulatorpin bigip-virtio emulator: CPU Affinity ---------------------------------- *: 1,3,5,7,9,11,13,15now runs the test again:[SUM] 0.0-10.1 sec 10.1 GBytes 8.58 Gbits/sec <=========8.58G, big difference!!! PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND P 23344 libvirt+ 20 0 35.350g 0.030t 23396 R 99.9 43.4 15:40.95 qemu-system-x86_64 -enable-kvm -name bigip-virtio -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 31357 -realtime ml+ 6 23341 libvirt+ 20 0 35.350g 0.030t 23396 R 99.9 43.4 17:39.58 qemu-system-x86_64 -enable-kvm -name bigip-virtio -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 31357 -realtime ml+ 0 23346 libvirt+ 20 0 35.350g 0.030t 23396 R 99.9 43.4 15:23.76 qemu-system-x86_64 -enable-kvm -name bigip-virtio -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 31357 -realtime ml+ 10 23347 libvirt+ 20 0 35.350g 0.030t 23396 R 99.9 43.4 15:29.99 qemu-system-x86_64 -enable-kvm -name bigip-virtio -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 31357 -realtime ml+ 12 23345 libvirt+ 20 0 35.350g 0.030t 23396 R 99.7 43.4 15:29.29 qemu-system-x86_64 -enable-kvm -name bigip-virtio -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 31357 -realtime ml+ 8 23348 libvirt+ 20 0 35.350g 0.030t 23396 R 99.7 43.4 15:42.95 qemu-system-x86_64 -enable-kvm -name bigip-virtio -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 31357 -realtime ml+ 14 23342 libvirt+ 20 0 35.350g 0.030t 23396 R 98.7 43.4 14:58.66 qemu-system-x86_64 -enable-kvm -name bigip-virtio -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 31357 -realtime ml+ 2 23343 libvirt+ 20 0 35.350g 0.030t 23396 R 96.0 43.4 14:58.54 qemu-system-x86_64 -enable-kvm -name bigip-virtio -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 31357 -realtime ml+ 4 23332 root 20 0 0 0 0 R 40.2 0.0 1:12.12 [vhost-23329] 15 23333 root 20 0 0 0 0 R 40.2 0.0 1:05.58 [vhost-23329] 13 23335 root 20 0 0 0 0 R 40.2 0.0 1:04.98 [vhost-23329] 3 23334 root 20 0 0 0 0 R 39.2 0.0 1:04.52 [vhost-23329] 1 23337 root 20 0 0 0 0 R 32.2 0.0 0:47.66 [vhost-23329] 11 23339 root 20 0 0 0 0 R 31.6 0.0 0:50.47 [vhost-23329] 15 23336 root 20 0 0 0 0 S 31.2 0.0 0:56.08 [vhost-23329] 5 23338 root 20 0 0 0 0 R 30.2 0.0 0:49.52 [vhost-23329]this tells that something in host kernel is using NUMA node 0 CPUs that 8 vhost thread unable to get scheduled moreenough to process the data traffic. my theory is that physical NIC IRQ is spread to even cores on NUMA node 0 and softirq runshigh on even cores, the vhost kernel thread didn't get enough time to run on even core, assigning the vhost to idle cores inNUMA node 1 so vhost get enough CPU cycles to process the data packet
No comments:
Post a Comment