Thursday, October 8, 2015

2.5 million tcp/http connection with mTCP and DPDK on a Dell Poweredge R210 8 core 32G ram and Intel NIC I350

Background:

The Secret to 10 Million Concurrent Connections -The Kernel is the Problem, Not the Solution


 Problem:

http://www.slideshare.net/garyachy/dpdk-44585840?qid=254b419f-1d44-44f1-99c4-87f13b7d5fe4&v=default&b=&from_search=8

Packet processing in Linux:  NIC RX/TX queues <--------->Ring buffers<---------->Driver<-------->Socket<--------->App


  • System calls
  • Context switching on blocking I/O
  • Data Copying from kernel to user space
  • Interrupt handing in kernel

Expense of sendto :

  • sendto -  system call:  96ns
  • sosend_dgram - lock sock_buff, alloc mbuf, copy in: 137ns
  • udp_output - UDP header setup: 57ns
  • ip_output - route lookup, ip header setup: 198ns
  • ether_output - MAC lookup, MAC header setup: 162ns
  • ixgbe_xmit - device programing: 220ns
Total: 950ns

Solution:
Packet processing with DPDK
NIC RX/TX queues <------->Ring buffers<---------->DPDK<------------->App

  • Processor affinity (separate cores)
  • Huge pages( no swap, TLB)
  • UIO (no copying from kernel)
  • Polling (no interrupts overhead)
  • Lockless synchronization(avoid waiting)
  • Batch packets handling
  • SSE, NUMA awareness
UIO for example:
Kernel space (UIO framework) <------>/dev/uioX<------>userspace epoll/mmap<-------->App


Problem:
Limitaions of the Kernel's TCP stack
  • Lack of  connection locality
  • Shared file descriptor space
  • Inefficient per-packet processing
  • System call overhead
Solution:
  • Batching in packet I/O, TCP processing, user applications ( reduce system call overhead)
  • Connection locality on multicore systems - handling same connection on same core, avoid cache pollution (solve connection locality)
  • No descriptor sharing between mTCP thread

clone of mTCP 
clone addition:
  • Change apachebench configure script to compile with dpdk support
  • Ported SSL BIO onto mTCP to enable apachebench to perform SSL test
  • Add SSL clienthello stress test based on epwget and ssl-dos
  • Add command line option in epwget and apachebenach to enable source address pool to congest servers
  • Increase mTCP SYN BACKLOG to increase concurrent connection
  • Changed DPDK .config to compile DPDK as combined shared library
  • Tuned send/receive buffer size in epwget.conf to achieve 2.5 million concurrent connection on Dell Poweredge R210 II 32G MEM, 8 core, Intel NIC I350

HTTP Load testing example:
root@pktgen:/usr/src/mtcp# LD_LIBRARY_PATH=.:/usr/src/mtcp/dpdk/lib LD_PRELOAD=$* ./apps/example/epwget 10.9.3.6/ 16000000 -N 8 -c 2500000
Application configuration:
URL: /
# of total_flows: 16000000
# of cores: 8
Concurrency: 2500000
---------------------------------------------------------------------------------
Loading mtcp configuration from : epwget.conf
Loading interface setting
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 0 on socket 0
EAL: Detected lcore 2 as core 1 on socket 0
EAL: Detected lcore 3 as core 1 on socket 0
EAL: Detected lcore 4 as core 2 on socket 0
EAL: Detected lcore 5 as core 2 on socket 0
EAL: Detected lcore 6 as core 3 on socket 0
EAL: Detected lcore 7 as core 3 on socket 0
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 8 lcore(s)
EAL: Auto-detected process type: PRIMARY
EAL: VFIO modules not all loaded, skip VFIO support...
EAL: Setting up memory...
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7fea80600000 (size = 0x200000)
EAL: Ask a virtual area of 0x7000000 bytes
EAL: Virtual area found at 0x7fea79400000 (size = 0x7000000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7fea79000000 (size = 0x200000)
EAL: Ask a virtual area of 0x38c00000 bytes
EAL: Virtual area found at 0x7fea40200000 (size = 0x38c00000)
EAL: Requesting 512 pages of size 2MB from socket 0
EAL: TSC frequency is ~3199969 KHz
EAL: open shared lib /usr/src/mtcp/dpdk/lib/librte_pmd_e1000.so
EAL: Master lcore 0 is ready (tid=825c7900;cpuset=[0])
EAL: lcore 4 is ready (tid=3dff5700;cpuset=[4])
EAL: lcore 5 is ready (tid=3d7f4700;cpuset=[5])
EAL: lcore 6 is ready (tid=3cff3700;cpuset=[6])
EAL: lcore 1 is ready (tid=3f7f8700;cpuset=[1])
EAL: lcore 2 is ready (tid=3eff7700;cpuset=[2])
EAL: lcore 3 is ready (tid=3e7f6700;cpuset=[3])
EAL: lcore 7 is ready (tid=3c7f2700;cpuset=[7])
EAL: PCI device 0000:01:00.0 on NUMA socket -1
EAL: probe driver: 8086:1521 rte_igb_pmd
EAL: PCI memory mapped at 0x7fea8248b000
EAL: PCI memory mapped at 0x7fea82487000
PMD: eth_igb_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x1521
EAL: PCI device 0000:01:00.1 on NUMA socket -1
EAL: probe driver: 8086:1521 rte_igb_pmd
EAL: PCI memory mapped at 0x7fea80500000
EAL: PCI memory mapped at 0x7fea82483000
PMD: eth_igb_dev_init(): port_id 1 vendorID=0x8086 deviceID=0x1521
EAL: PCI device 0000:01:00.2 on NUMA socket -1
EAL: probe driver: 8086:1521 rte_igb_pmd
EAL: PCI memory mapped at 0x7fea80400000
EAL: PCI memory mapped at 0x7fea8247f000
PMD: eth_igb_dev_init(): port_id 2 vendorID=0x8086 deviceID=0x1521
EAL: PCI device 0000:01:00.3 on NUMA socket -1
EAL: probe driver: 8086:1521 rte_igb_pmd
EAL: PCI memory mapped at 0x7fea79300000
EAL: PCI memory mapped at 0x7fea8247b000
PMD: eth_igb_dev_init(): port_id 3 vendorID=0x8086 deviceID=0x1521
Total number of attached devices: 1
Interface name: dpdk0
Configurations:
Number of CPU cores available: 8
Number of CPU cores to use: 8
Maximum number of concurrency per core: 1000000
Number of source ip to use: 64
Maximum number of preallocated buffers per core: 1000000
Receive buffer size: 1024
Send buffer size: 1024
TCP timeout seconds: 30
TCP timewait seconds: 0
NICs to print statistics: dpdk0
---------------------------------------------------------------------------------
Interfaces:
name: dpdk0, ifindex: 0, hwaddr: A0:36:9F:A1:4D:6C, ipaddr: 10.9.3.9, netmask: 255.255.255.0
Number of NIC queues: 8
---------------------------------------------------------------------------------
Loading routing configurations from : config/route.conf
Routes:
Destination: 10.9.3.0/24, Mask: 255.255.255.0, Masked: 10.9.3.0, Route: ifdx-0
Destination: 10.9.3.0/24, Mask: 255.255.255.0, Masked: 10.9.3.0, Route: ifdx-0
Destination: 10.9.1.0/24, Mask: 255.255.255.0, Masked: 10.9.1.0, Route: ifdx-0
---------------------------------------------------------------------------------
Loading ARP table from : config/arp.conf
ARP Table:
IP addr: 10.9.3.1, dst_hwaddr: 52:54:00:2E:62:A2
IP addr: 10.9.3.6, dst_hwaddr: 52:54:00:2E:62:A2
IP addr: 10.9.1.2, dst_hwaddr: 00:0A:F7:7C:57:E1
---------------------------------------------------------------------------------
Initializing port 0... PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fea79ee7300 hw_ring=0x7fea80721880 dma_addr=0x35b21880
PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fea79ee6e00 hw_ring=0x7fea80731880 dma_addr=0x35b31880
PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fea79ee6900 hw_ring=0x7fea80741880 dma_addr=0x35b41880
PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fea79ee6400 hw_ring=0x7fea80751880 dma_addr=0x35b51880
PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fea79ee5f00 hw_ring=0x7fea80761880 dma_addr=0x35b61880
PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fea79ee5a00 hw_ring=0x7fea80771880 dma_addr=0x35b71880
PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fea79ee5500 hw_ring=0x7fea80781880 dma_addr=0x35b81880
PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fea79ee5000 hw_ring=0x7fea80791880 dma_addr=0x35b91880
PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fea79ee4700 hw_ring=0x7fea807a1880 dma_addr=0x35ba1880
PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fea79ee3e00 hw_ring=0x7fea807b1880 dma_addr=0x35bb1880
PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fea79ee3500 hw_ring=0x7fea807c1880 dma_addr=0x35bc1880
PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fea79ee2c00 hw_ring=0x7fea807d1880 dma_addr=0x35bd1880
PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fea79ee2300 hw_ring=0x7fea807e1880 dma_addr=0x35be1880
PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fea79ee1a00 hw_ring=0x7fea79000000 dma_addr=0x7ce400000
PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fea79ee1100 hw_ring=0x7fea79010000 dma_addr=0x7ce410000
PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fea79ee0800 hw_ring=0x7fea79020000 dma_addr=0x7ce420000
PMD: eth_igb_start(): <<
PMD: rte_eth_dev_config_restore: port 0: MAC address array not supported
done:
Checking link status.....................................done
Port 0 Link Up - speed 1000 Mbps - full-duplex
Configuration updated by mtcp_setconf().
Configurations:
Number of CPU cores available: 8
Number of CPU cores to use: 8
Maximum number of concurrency per core: 937500
Number of source ip to use: 64
Maximum number of preallocated buffers per core: 937500
Receive buffer size: 1024
Send buffer size: 1024
TCP timeout seconds: 30
TCP timewait seconds: 0
NICs to print statistics: dpdk0
---------------------------------------------------------------------------------
CPU 3: initialization finished.
[mtcp_create_context:1174] CPU 3 is in charge of printing stats.
[CPU 3] dpdk0 flows: 0, RX: 64(pps) (err: 0), 0.00(Gbps), TX: 64(pps), 0.00(Gbps)
[ ALL ] dpdk0 flows: 0, RX: 64(pps) (err: 0), 0.00(Gbps), TX: 64(pps), 0.00(Gbps)
CPU 5: initialization finished.
CPU 2: initialization finished.
CPU 4: initialization finished.
CPU 6: initialization finished.
CPU 7: initialization finished.
CPU 0: initialization finished.
CPU 1: initialization finished.
[WARINING] Available # addresses (516088) is smaller than the max concurrency (937500).
Thread 3 handles 2000000 flows. connecting to 10.9.3.6:80
[WARINING] Available # addresses (516088) is smaller than the max concurrency (937500).
[WARINING] Available # addresses (516088) is smaller than the max concurrency (937500).
Thread 4 handles 2000000 flows. connecting to 10.9.3.6:80
Thread 5 handles 2000000 flows. connecting to 10.9.3.6:80
[CPU 0] dpdk0 flows: 0, RX: 5152(pps) (err: 0), 0.01(Gbps), TX: 5094(pps), 0.00(Gbps)
[CPU 1] dpdk0 flows: 0, RX: 4563(pps) (err: 0), 0.01(Gbps), TX: 4563(pps), 0.00(Gbps)
[CPU 2] dpdk0 flows: 0, RX: 4855(pps) (err: 0), 0.01(Gbps), TX: 4855(pps), 0.00(Gbps)
[CPU 3] dpdk0 flows: 28000, RX: 4716(pps) (err: 0), 0.01(Gbps), TX: 9975(pps), 0.01(Gbps)
[CPU 4] dpdk0 flows: 13579, RX: 5357(pps) (err: 0), 0.01(Gbps), TX: 5347(pps), 0.00(Gbps)
[CPU 5] dpdk0 flows: 12891, RX: 4985(pps) (err: 0), 0.01(Gbps), TX: 4946(pps), 0.00(Gbps)
[CPU 6] dpdk0 flows: 0, RX: 4612(pps) (err: 0), 0.01(Gbps), TX: 4612(pps), 0.00(Gbps)
[CPU 7] dpdk0 flows: 0, RX: 5060(pps) (err: 0), 0.01(Gbps), TX: 5060(pps), 0.00(Gbps)
[ ALL ] dpdk0 flows: 54470, RX: 39300(pps) (err: 0), 0.06(Gbps), TX: 44452(pps), 0.03(Gbps)
[WARINING] Available # addresses (516087) is smaller than the max concurrency (937500).
[WARINING] Available # addresses (516088) is smaller than the max concurrency (937500).
[WARINING] Available # addresses (516088) is smaller than the max concurrency (937500).
[WARINING] Available # addresses (516088) is smaller than the max concurrency (937500).
[WARINING] Available # addresses (516088) is smaller than the max concurrency (937500).
Thread 0 handles 2000000 flows. connecting to 10.9.3.6:80
Thread 1 handles 2000000 flows. connecting to 10.9.3.6:80
Thread 6 handles 2000000 flows. connecting to 10.9.3.6:80
Thread 2 handles 2000000 flows. connecting to 10.9.3.6:80
Thread 7 handles 2000000 flows. connecting to 10.9.3.6:80
[CPU 0] dpdk0 flows: 312500, RX: 9719(pps) (err: 0), 0.01(Gbps), TX: 117560(pps), 0.09(Gbps)
[CPU 1] dpdk0 flows: 303921, RX: 8151(pps) (err: 0), 0.01(Gbps), TX: 110036(pps), 0.09(Gbps)
[CPU 2] dpdk0 flows: 312500, RX: 9740(pps) (err: 0), 0.01(Gbps), TX: 120092(pps), 0.09(Gbps)
[CPU 3] dpdk0 flows: 312500, RX: 12824(pps) (err: 0), 0.01(Gbps), TX: 146624(pps), 0.11(Gbps)
[CPU 4] dpdk0 flows: 312500, RX: 10267(pps) (err: 0), 0.01(Gbps), TX: 133376(pps), 0.10(Gbps)
[CPU 5] dpdk0 flows: 312500, RX: 10808(pps) (err: 0), 0.01(Gbps), TX: 128448(pps), 0.10(Gbps)
[CPU 6] dpdk0 flows: 307636, RX: 8108(pps) (err: 0), 0.01(Gbps), TX: 113976(pps), 0.09(Gbps)
[CPU 7] dpdk0 flows: 312500, RX: 9692(pps) (err: 0), 0.01(Gbps), TX: 111934(pps), 0.09(Gbps)
[ ALL ] dpdk0 flows: 2486557, RX: 79309(pps) (err: 0), 0.08(Gbps), TX: 982046(pps), 0.77(Gbps)
[ ALL ] connect: 2497968, read: 0 MB, write: 1 MB, completes: 0 (resp_time avg: 0, max: 0 us)
[ ALL ] connect: 2032, read: 0 MB, write: 0 MB, completes: 0 (resp_time avg: 0, max: 0 us)
[CPU 0] dpdk0 flows: 312500, RX: 8800(pps) (err: 0), 0.01(Gbps), TX: 149696(pps), 0.12(Gbps)
[CPU 1] dpdk0 flows: 312500, RX: 7700(pps) (err: 0), 0.01(Gbps), TX: 148736(pps), 0.12(Gbps)
[CPU 2] dpdk0 flows: 312500, RX: 7681(pps) (err: 0), 0.01(Gbps), TX: 150016(pps), 0.12(Gbps)
[CPU 3] dpdk0 flows: 312500, RX: 7654(pps) (err: 0), 0.01(Gbps), TX: 150016(pps), 0.12(Gbps)
[CPU 4] dpdk0 flows: 312500, RX: 7709(pps) (err: 0), 0.01(Gbps), TX: 150016(pps), 0.12(Gbps)
[CPU 5] dpdk0 flows: 312500, RX: 7595(pps) (err: 0), 0.01(Gbps), TX: 150016(pps), 0.12(Gbps)
[CPU 6] dpdk0 flows: 312500, RX: 7548(pps) (err: 0), 0.01(Gbps), TX: 149248(pps), 0.12(Gbps)
[CPU 7] dpdk0 flows: 312500, RX: 8007(pps) (err: 0), 0.01(Gbps), TX: 150016(pps), 0.12(Gbps)
[ ALL ] dpdk0 flows: 2500000, RX: 62694(pps) (err: 0), 0.07(Gbps), TX: 1197760(pps), 0.94(Gbps)
[ ALL ] connect: 0, read: 0 MB, write: 1 MB, completes: 0 (resp_time avg: 0, max: 0 us)
[CPU 0] dpdk0 flows: 312500, RX: 8640(pps) (err: 0), 0.01(Gbps), TX: 149440(pps), 0.12(Gbps)
[CPU 1] dpdk0 flows: 312500, RX: 6472(pps) (err: 0), 0.01(Gbps), TX: 149440(pps), 0.12(Gbps)
[CPU 2] dpdk0 flows: 312500, RX: 6130(pps) (err: 0), 0.01(Gbps), TX: 149440(pps), 0.12(Gbps)
[CPU 3] dpdk0 flows: 312500, RX: 6974(pps) (err: 0), 0.01(Gbps), TX: 149440(pps), 0.12(Gbps)
[CPU 4] dpdk0 flows: 312500, RX: 7821(pps) (err: 0), 0.01(Gbps), TX: 149440(pps), 0.12(Gbps)
[CPU 5] dpdk0 flows: 312500, RX: 6535(pps) (err: 0), 0.01(Gbps), TX: 149440(pps), 0.12(Gbps)
[CPU 6] dpdk0 flows: 312500, RX: 7593(pps) (err: 0), 0.01(Gbps), TX: 149440(pps), 0.12(Gbps)
[CPU 7] dpdk0 flows: 312500, RX: 7416(pps) (err: 0), 0.01(Gbps), TX: 149440(pps), 0.12(Gbps)
[ ALL ] dpdk0 flows: 2500000, RX: 57581(pps) (err: 0), 0.06(Gbps), TX: 1195520(pps), 0.93(Gbps) <===============2.5 million concurrent connection
[ ALL ] connect: 0, read: 0 MB, write: 1 MB, completes: 0 (resp_time avg: 0, max: 0 us)

------------------------------------------------------------------
Ltm::Virtual Server: vs_http
------------------------------------------------------------------
Status
Availability : unknown
State : enabled
Reason : The children pool member(s) either don't have service checking enabled, or service check results are not available yet
CMP : enabled
CMP Mode : all-cpus
Destination : 10.9.3.6:80
Traffic ClientSide Ephemeral General
Bits In 56.4G 0 -
Bits Out 95.0G 0 -
Packets In 92.7M 0 -
Packets Out 111.5M 0 -
Current Connections 1.0M 0 - <=================1 million concurrent connection, can be higher since the VE can't handle such concurrency rate.
Maximum Connections 1.2M 0 -
Total Connections 8.2M 0 -
Evicted Connections 0 0 -
Slow Connections Killed 0 0 -
Min Conn Duration/msec - - 156
Max Conn Duration/msec - - 2.4M
Mean Conn Duration/msec - - 609.8K
Total Requests - - 0

Example SSL ClientHello load test:

root@pktgen:/usr/src/mtcp# LD_LIBRARY_PATH=.:/usr/src/mtcp/dpdk/lib LD_PRELOAD=$* ./apps/ssl-dos/brute-shake 10.9.3.6 1600000 -N 8 -c 250000
Application configuration:
Host: 10.9.3.6
# of total_flows: 1600000
# of cores: 8
Concurrency: 250000
---------------------------------------------------------------------------------
Loading mtcp configuration from : mtcp-brute-shake.conf
Loading interface setting

[CPU 0] dpdk0 flows: 31250, RX: 11485(pps) (err: 0), 0.01(Gbps), TX: 71028(pps), 0.09(Gbps)
[CPU 1] dpdk0 flows: 31250, RX: 10897(pps) (err: 0), 0.01(Gbps), TX: 72320(pps), 0.10(Gbps)
[CPU 2] dpdk0 flows: 31250, RX: 3960(pps) (err: 0), 0.00(Gbps), TX: 54759(pps), 0.06(Gbps)
[CPU 3] dpdk0 flows: 31250, RX: 3413(pps) (err: 0), 0.00(Gbps), TX: 54034(pps), 0.06(Gbps)
[CPU 4] dpdk0 flows: 31250, RX: 9698(pps) (err: 0), 0.01(Gbps), TX: 67136(pps), 0.08(Gbps)
[CPU 5] dpdk0 flows: 31253, RX: 16285(pps) (err: 0), 0.01(Gbps), TX: 79912(pps), 0.15(Gbps)
[CPU 6] dpdk0 flows: 31250, RX: 11328(pps) (err: 0), 0.01(Gbps), TX: 70734(pps), 0.09(Gbps)
[CPU 7] dpdk0 flows: 31250, RX: 11252(pps) (err: 0), 0.01(Gbps), TX: 73762(pps), 0.11(Gbps)
[ ALL ] dpdk0 flows: 250003, RX: 78318(pps) (err: 0), 0.06(Gbps), TX: 543685(pps), 0.74(Gbps)
[ ALL ] connect: 71, read: 0 MB, write: 67 MB, completes: 71 (resp_time avg: 107258, max: 1102163 us)
[CPU 0] dpdk0 flows: 31250, RX: 9615(pps) (err: 0), 0.01(Gbps), TX: 54656(pps), 0.14(Gbps)
[CPU 1] dpdk0 flows: 31250, RX: 10655(pps) (err: 0), 0.01(Gbps), TX: 54656(pps), 0.15(Gbps)
[CPU 2] dpdk0 flows: 31250, RX: 8994(pps) (err: 0), 0.01(Gbps), TX: 54656(pps), 0.06(Gbps)
[CPU 3] dpdk0 flows: 31250, RX: 8951(pps) (err: 0), 0.01(Gbps), TX: 54656(pps), 0.05(Gbps)
[CPU 4] dpdk0 flows: 31250, RX: 9472(pps) (err: 0), 0.01(Gbps), TX: 54656(pps), 0.12(Gbps)
[CPU 5] dpdk0 flows: 31421, RX: 8307(pps) (err: 0), 0.01(Gbps), TX: 54656(pps), 0.18(Gbps)
[CPU 6] dpdk0 flows: 31250, RX: 9430(pps) (err: 0), 0.01(Gbps), TX: 54656(pps), 0.14(Gbps)
[CPU 7] dpdk0 flows: 31250, RX: 10946(pps) (err: 0), 0.01(Gbps), TX: 54656(pps), 0.16(Gbps)
[ ALL ] dpdk0 flows: 250171, RX: 76370(pps) (err: 0), 0.06(Gbps), TX: 437248(pps), 1.00(Gbps)
[CPU 0] dpdk0 flows: 31250, RX: 3341(pps) (err: 0), 0.00(Gbps), TX: 42816(pps), 0.16(Gbps)
[CPU 1] dpdk0 flows: 31250, RX: 3989(pps) (err: 0), 0.00(Gbps), TX: 42816(pps), 0.16(Gbps)
[CPU 2] dpdk0 flows: 31250, RX: 9972(pps) (err: 0), 0.01(Gbps), TX: 42816(pps), 0.06(Gbps)
[CPU 3] dpdk0 flows: 31250, RX: 10004(pps) (err: 0), 0.01(Gbps), TX: 42816(pps), 0.06(Gbps)
[CPU 4] dpdk0 flows: 31250, RX: 4438(pps) (err: 0), 0.00(Gbps), TX: 42816(pps), 0.13(Gbps)
[CPU 5] dpdk0 flows: 31572, RX: 5062(pps) (err: 0), 0.01(Gbps), TX: 42816(pps), 0.13(Gbps)
[CPU 6] dpdk0 flows: 31250, RX: 3305(pps) (err: 0), 0.00(Gbps), TX: 42816(pps), 0.15(Gbps)
[CPU 7] dpdk0 flows: 31250, RX: 4161(pps) (err: 0), 0.00(Gbps), TX: 42816(pps), 0.15(Gbps)
[ ALL ] dpdk0 flows: 250322, RX: 44272(pps) (err: 0), 0.04(Gbps), TX: 342528(pps), 1.00(Gbps)
[ ALL ] connect: 259, read: 0 MB, write: 40 MB, completes: 259 (resp_time avg: 230897, max: 2726227 us)
[ ALL ] connect: 61, read: 0 MB, write: 8 MB, completes: 61 (resp_time avg: 364367, max: 3109163 us)

VE CPU is high due to SSL stress test
top - 11:50:44 up 4 days, 20:52, 2 users, load average: 0.26, 0.18, 0.06
Tasks: 373 total, 6 running, 367 sleeping, 0 stopped, 0 zombie
Cpu0 : 55.1%us, 1.2%sy, 0.0%ni, 1.4%id, 0.0%wa, 0.3%hi, 41.7%si, 0.3%st
Cpu1 : 91.1%us, 4.0%sy, 0.0%ni, 4.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.3%st
Cpu2 : 89.0%us, 5.5%sy, 0.0%ni, 5.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.3%st
Cpu3 : 90.7%us, 4.7%sy, 0.0%ni, 4.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.3%st
Mem: 14403124k total, 13995684k used, 407440k free, 112888k buffers
Swap: 1048568k total, 441580k used, 606988k free, 358956k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
10025 root RT 0 11.9g 130m 104m R 96.6 0.9 113:16.67 0 tmm.0 -T 4 --tmid 0 --npus 4 --platform Z100 -m -s 12088
10034 root RT 0 11.9g 130m 104m R 93.5 0.9 90:49.78 3 tmm.0 -T 4 --tmid 0 --npus 4 --platform Z100 -m -s 12088
10032 root RT 0 11.9g 130m 104m R 93.2 0.9 112:50.29 1 tmm.0 -T 4 --tmid 0 --npus 4 --platform Z100 -m -s 12088
10033 root RT 0 11.9g 130m 104m R 93.2 0.9 90:21.94 2 tmm.0 -T 4 --tmid 0 --npus 4 --platform Z100 -m -s 12088

Example of SYN flooding in line rate
root@pktgen:/usr/src/MoonGen# build/MoonGen examples/l3-tcp-syn-flood.lua 0 10.0.0.1 16000000 10000

PMD: To improve 1G driver performance, consider setting the TX WTHRESH value to 4, 8, or 16.
Port 0 (A0:36:9F:A1:4D:6C) is up: full-duplex 1000 MBit/s
INFO: Detected an IPv4 address.
18:56:23.699493 ETH a0:36:9f:a1:4d:6d > 52:54:00:2e:62:a2 type 0x0800 (IP4)
IP4 10.0.0.1 > 10.9.3.6 ver 4 ihl 5 tos 0 len 46 id 0 flags 0 frag 0 ttl 64 proto 0x06 (TCP) cksum 0x0000
TCP 1025 > 443 seq# 1 ack# 0 offset 0x5 reserved 0x00 flags 0x02 [X|X|X|X|SYN|X] win 10 cksum 0x0000 urg 0
0x0000: 5254 002e 62a2 a036 9fa1 4d6d 0800 4500
0x0010: 002e 0000 0000 4006 0000 0a00 0001 0a09
0x0020: 0306 0401 01bb 0000 0001 0000 0000 5002
0x0030: 000a 0000 0000 0000 0000 0000
18:56:23.699676 ETH a0:36:9f:a1:4d:6d > 52:54:00:2e:62:a2 type 0x0800 (IP4)
IP4 10.0.0.2 > 10.9.3.6 ver 4 ihl 5 tos 0 len 46 id 0 flags 0 frag 0 ttl 64 proto 0x06 (TCP) cksum 0x0000
TCP 1025 > 443 seq# 1 ack# 0 offset 0x5 reserved 0x00 flags 0x02 [X|X|X|X|SYN|X] win 10 cksum 0x0000 urg 0
0x0000: 5254 002e 62a2 a036 9fa1 4d6d 0800 4500
0x0010: 002e 0000 0000 4006 0000 0a00 0002 0a09
0x0020: 0306 0401 01bb 0000 0001 0000 0000 5002
0x0030: 000a 0000 0000 0000 0000 0000
18:56:23.699797 ETH a0:36:9f:a1:4d:6d > 52:54:00:2e:62:a2 type 0x0800 (IP4)
IP4 10.0.0.3 > 10.9.3.6 ver 4 ihl 5 tos 0 len 46 id 0 flags 0 frag 0 ttl 64 proto 0x06 (TCP) cksum 0x0000
TCP 1025 > 443 seq# 1 ack# 0 offset 0x5 reserved 0x00 flags 0x02 [X|X|X|X|SYN|X] win 10 cksum 0x0000 urg 0
0x0000: 5254 002e 62a2 a036 9fa1 4d6d 0800 4500
0x0010: 002e 0000 0000 4006 0000 0a00 0003 0a09
0x0020: 0306 0401 01bb 0000 0001 0000 0000 5002
0x0030: 000a 0000 0000 0000 0000 0000
[Device: id=0] Sent 1481340 packets, current rate 1.48 Mpps, 758.39 MBit/s, 995.39 MBit/s wire rate. <======================
[Device: id=0] Sent 2963453 packets, current rate 1.48 Mpps, 758.81 MBit/s, 995.94 MBit/s wire rate.
[Device: id=0] Sent 4446076 packets, current rate 1.48 Mpps, 759.09 MBit/s, 996.31 MBit/s wire rate.
[Device: id=0] Sent 5928571 packets, current rate 1.48 Mpps, 759.02 MBit/s, 996.21 MBit/s wire rate.
[Device: id=0] Sent 7411068 packets, current rate 1.48 Mpps, 759.01 MBit/s, 996.20 MBit/s wire rate.
[Device: id=0] Sent 8893308 packets, current rate 1.48 Mpps, 758.90 MBit/s, 996.06 MBit/s wire rate.
[Device: id=0] Sent 10375803 packets, current rate 1.48 Mpps, 758.98 MBit/s, 996.17 MBit/s wire rate.
[Device: id=0] Sent 11857787 packets, current rate 1.48 Mpps, 758.73 MBit/s, 995.83 MBit/s wire rate.
[Device: id=0] Sent 13340283 packets, current rate 1.48 Mpps, 759.00 MBit/s, 996.19 MBit/s wire rate.
[Device: id=0] Sent 14822549 packets, current rate 1.48 Mpps, 758.86 MBit/s, 996.01 MBit/s wire rate.
[Device: id=0] Sent 16304759 packets, current rate 1.48 Mpps, 758.83 MBit/s, 995.97 MBit/s wire rate.
[Device: id=0] Sent 17787384 packets, current rate 1.48 Mpps, 759.09 MBit/s, 996.31 MBit/s wire rate.
[Device: id=0] Sent 19270007 packets, current rate 1.48 Mpps, 759.09 MBit/s, 996.31 MBit/s wire rate.
[Device: id=0] Sent 20751607 packets, current rate 1.48 Mpps, 758.52 MBit/s, 995.56 MBit/s wire rate.
[Device: id=0] Sent 22233975 packets, current rate 1.48 Mpps, 758.97 MBit/s, 996.15 MBit/s wire rate.
[Device: id=0] Sent 23716600 packets, current rate 1.48 Mpps, 759.09 MBit/s, 996.31 MBit/s wire rate.
[Device: id=0] Sent 25199223 packets, current rate 1.48 Mpps, 759.09 MBit/s, 996.31 MBit/s wire rate.
[Device: id=0] Sent 26681591 packets, current rate 1.48 Mpps, 758.93 MBit/s, 996.09 MBit/s wire rate.
[Device: id=0] Sent 28164087 packets, current rate 1.48 Mpps, 759.02 MBit/s, 996.22 MBit/s wire rate.
[Device: id=0] Sent 29646200 packets, current rate 1.48 Mpps, 758.82 MBit/s, 995.95 MBit/s wire rate.
[Device: id=0] Sent 31128696 packets, current rate 1.48 Mpps, 759.03 MBit/s, 996.22 MBit/s wire rate.
[Device: id=0] Sent 32611319 packets, current rate 1.48 Mpps, 759.09 MBit/s, 996.31 MBit/s wire rate.

Example using mTCP ported multipthread apachebench for http/https load testing
(apachebench is slightly more complex than epwget and requires bigger memory footprint, not ideal for million concurrency connection if given hardware has limited memory, but it has more features...)
root@pktgen:/usr/src/mtcp/apps/apache_benchmark_deprecated/support# LD_LIBRARY_PATH=.:/usr/src/mtcp/dpdk/lib:/usr/local/lib LD_PRELOAD=$* .libs/ab -n 1000000 -c 80000 -N 8 -L 64 10.9.3.6/

This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
---------------------------------------------------------------------------------
Loading mtcp configuration from : /etc/mtcp/config/mtcp.conf
Loading interface setting
..............CUT........

[CPU 0] dpdk0 flows: 10000, RX: 18370(pps) (err: 0), 0.02(Gbps), TX: 37765(pps), 0.03(Gbps)
[CPU 1] dpdk0 flows: 10012, RX: 32126(pps) (err: 0), 0.03(Gbps), TX: 51169(pps), 0.04(Gbps)
[CPU 2] dpdk0 flows: 10004, RX: 26160(pps) (err: 0), 0.02(Gbps), TX: 45429(pps), 0.04(Gbps)
[CPU 3] dpdk0 flows: 10000, RX: 20821(pps) (err: 0), 0.02(Gbps), TX: 40208(pps), 0.03(Gbps)
[CPU 4] dpdk0 flows: 10033, RX: 22857(pps) (err: 0), 0.02(Gbps), TX: 41761(pps), 0.04(Gbps)
[CPU 5] dpdk0 flows: 10014, RX: 66358(pps) (err: 0), 0.06(Gbps), TX: 86359(pps), 0.07(Gbps)
[CPU 6] dpdk0 flows: 10000, RX: 24649(pps) (err: 0), 0.02(Gbps), TX: 44590(pps), 0.04(Gbps)
[CPU 7] dpdk0 flows: 10004, RX: 23713(pps) (err: 0), 0.02(Gbps), TX: 45680(pps), 0.04(Gbps)
[ ALL ] dpdk0 flows: 80067, RX: 235054(pps) (err: 0), 0.22(Gbps), TX: 392961(pps), 0.34(Gbps)
[CPU 0] dpdk0 flows: 10000, RX: 31844(pps) (err: 0), 0.03(Gbps), TX: 36374(pps), 0.03(Gbps)
[CPU 1] dpdk0 flows: 10054, RX: 36440(pps) (err: 0), 0.03(Gbps), TX: 44083(pps), 0.04(Gbps)
[CPU 2] dpdk0 flows: 10003, RX: 40730(pps) (err: 0), 0.04(Gbps), TX: 49276(pps), 0.04(Gbps)
[CPU 3] dpdk0 flows: 10000, RX: 28863(pps) (err: 0), 0.03(Gbps), TX: 35395(pps), 0.03(Gbps)
[CPU 4] dpdk0 flows: 10001, RX: 31824(pps) (err: 0), 0.03(Gbps), TX: 39832(pps), 0.04(Gbps)
[CPU 5] dpdk0 flows: 10048, RX: 61479(pps) (err: 0), 0.05(Gbps), TX: 64870(pps), 0.05(Gbps)
[CPU 6] dpdk0 flows: 10007, RX: 38790(pps) (err: 0), 0.04(Gbps), TX: 41937(pps), 0.03(Gbps)
[CPU 7] dpdk0 flows: 10006, RX: 39920(pps) (err: 0), 0.04(Gbps), TX: 41469(pps), 0.03(Gbps)
[ ALL ] dpdk0 flows: 80118, RX: 309890(pps) (err: 0), 0.29(Gbps), TX: 353236(pps), 0.30(Gbps)

root@pktgen:/usr/src/mtcp/apps/apache_benchmark_deprecated/support# LD_LIBRARY_PATH=.:/usr/src/mtcp/dpdk/lib:/usr/local/lib LD_PRELOAD=$* .libs/ab -n 1000000 -c 10000 -N 8 -L 64 https://10.9.3.6/


This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
---------------------------------------------------------------------------------
Loading mtcp configuration from : /etc/mtcp/config/mtcp.conf
Loading interface setting
....................CUT...............
[CPU 0] dpdk0 flows: 1000, RX: 2457(pps) (err: 0), 0.00(Gbps), TX: 3372(pps), 0.00(Gbps)
[CPU 1] dpdk0 flows: 1000, RX: 3159(pps) (err: 0), 0.01(Gbps), TX: 3759(pps), 0.01(Gbps)
[CPU 2] dpdk0 flows: 1000, RX: 2000(pps) (err: 0), 0.00(Gbps), TX: 3000(pps), 0.00(Gbps)
[CPU 3] dpdk0 flows: 1000, RX: 2000(pps) (err: 0), 0.00(Gbps), TX: 3000(pps), 0.00(Gbps)
[CPU 4] dpdk0 flows: 1000, RX: 2294(pps) (err: 0), 0.00(Gbps), TX: 3257(pps), 0.00(Gbps)
[CPU 5] dpdk0 flows: 1000, RX: 2264(pps) (err: 0), 0.00(Gbps), TX: 3229(pps), 0.00(Gbps)
[CPU 6] dpdk0 flows: 1000, RX: 2909(pps) (err: 0), 0.01(Gbps), TX: 2579(pps), 0.00(Gbps)
[CPU 7] dpdk0 flows: 1000, RX: 2286(pps) (err: 0), 0.00(Gbps), TX: 3234(pps), 0.00(Gbps)
[ ALL ] dpdk0 flows: 8000, RX: 19369(pps) (err: 0), 0.04(Gbps), TX: 25430(pps), 0.04(Gbps)
[CPU 0] dpdk0 flows: 1000, RX: 871(pps) (err: 0), 0.01(Gbps), TX: 712(pps), 0.00(Gbps)
[CPU 1] dpdk0 flows: 1000, RX: 895(pps) (err: 0), 0.01(Gbps), TX: 713(pps), 0.00(Gbps)
[CPU 2] dpdk0 flows: 1000, RX: 606(pps) (err: 0), 0.00(Gbps), TX: 512(pps), 0.00(Gbps)
[CPU 3] dpdk0 flows: 1000, RX: 662(pps) (err: 0), 0.00(Gbps), TX: 560(pps), 0.00(Gbps)
[CPU 4] dpdk0 flows: 1000, RX: 891(pps) (err: 0), 0.01(Gbps), TX: 736(pps), 0.00(Gbps)
[CPU 5] dpdk0 flows: 1000, RX: 881(pps) (err: 0), 0.01(Gbps), TX: 713(pps), 0.00(Gbps)
[CPU 6] dpdk0 flows: 1000, RX: 153(pps) (err: 0), 0.00(Gbps), TX: 151(pps), 0.00(Gbps)
[CPU 7] dpdk0 flows: 1000, RX: 872(pps) (err: 0), 0.01(Gbps), TX: 714(pps), 0.00(Gbps)
[ ALL ] dpdk0 flows: 8000, RX: 5831(pps) (err: 0), 0.03(Gbps), TX: 4811(pps), 0.01(Gbps)

Followers