Althea Performance

jkilpatr · June 7, 2018, 11:49pm

I spent this morning gathering performance data for all of our major supported devices. The results are very interesting.

First a note on methodology, these tests where intentionally performed against a distant (70ms) exit server and against real internet targets. In terms of pure iperf performance these devices can usually do about ~30% more, but I wanted to get realistic usage numbers. Including things like TCP window scaling thanks to the high latency and distance. What’s not realistic about this test is that mesh links are perfect (Ethernet cables). I wanted to inspect the software stack that’s under direct our control for problems before diving into antenna tuning.

The results are really interesting. Latency wise performance is flawless across the board, less than half a ms additional latency per hop, meaning video calls or live streaming will continue to be viable even deep into the network.

In terms of throughput loss to meshing things are also pretty positive. When meshing with a higher performance gateway devices nodes actually saw throughput increase which says good things about our packet scheduling and network design. On the other hand deeper connections of identical devices must suffer the rules of queueing theory. Throughput has to go down as the chain increases in length. At a rate of about 5-10% per hop currently from current tests, but these aren’t very reliable to extrapolate form.

The absolute value of mesh throughput is a little more complicated and related to Wireguard encryption performance and other device properties. Lets compare our two highest performance devices. The EdgerouterX using a MT7621 and the Turris Omnia using a Armada 38x. You may notice that the Omnia has about double the cpu clock speed and about double the performance.

While this is intuitively ok, the reality is that it shouldn’t be the case. The Armada has a L2 cache (L1 cache specs are the same on both devices) and ARM is typically considered to have a higher IPC (instructions per clock) execution rate. It’s hard to narrow down what’s causing what here, my hunch is memory speed and lack of DMA to make copying traffic around efficient, I should probably do some cache miss stats on the Omnia before reaching further conclusions.

A supporting note is that on the n750 35% of the total cpu load is in sirq AKA copying data off of the nic and into memory. This could indicate a lack of DMA (direct memory access) where higher frequencies would directly assist in memory bound workloads.

jkilpatr · June 9, 2018, 4:06pm

The immediate course of action here is to try and find more high frequency home routers that are also programmable.

like this one, which would hopefully reduce the entry price to 200-300mbps performance to $160 instead of $300.

https://www.amazon.com/Wireless-StreamBoost-Beamforming-Antennas-NBG6817/dp/B01I4223HS/ref=pd_sbs_147_4?_encoding=UTF8&pd_rd_i=B01I4223HS&pd_rd_r=KX2WR5E7MMHMTBADAHVD&pd_rd_w=kpSrP&pd_rd_wg=SxaeE&refRID=KX2WR5E7MMHMTBADAHVD&th=1

jkilpatr · June 26, 2018, 4:51pm

So the results for the Armor Z2 set a new record for disappointing. The big question is why? In every objective measure the integrated processor is better than the one in the Omnia, which demonstrates high performance.

I’m leaning to an issue with the port, and I’ll have to dig more deeply to see if I can find exactly why. Both targets use Wireguard’s NEON optimization. Next up on my to test list are the following devices.

I’m most excited with the gil.inet while I generally haven’t found their hardware to be the best in the world it’s always been well done and easy to work with. A quad core arm gives me high hopes on mesh performance.

https://store.openmesh.com/products/access-points/a62-universal-tri-band-802-11ac-wave-2-cloud-managed-wifi-access-point.html

Jehan · June 26, 2018, 5:05pm

@jkilpatr have you tried reaching out to the batman adv folks about this? Their protocol necessitates processing on a route that possibly skips the hardware optimizations etc so maybe they have experienced similar issues

jkilpatr · July 4, 2018, 11:57am

Here’s the magic set of commands to install perf on OpenWRT as the package seems to be broken by default.

CONFIG_PACKAGE_iperf3=y
CONFIG_PACKAGE_perf=y
CONFIG_PACKAGE_libdw=y
CONFIG_PACKAGE_libelf=y
CONFIG_PACKAGE_libelf1=y
CONFIG_PACKAGE_libbz2=y
CONFIG_PACKAGE_libunwind=y
CONFIG_PACKAGE_objdump=y
CONFIG_PACKAGE_libopcodes=y
CONFIG_PACKAGE_libbfd=y

jkilpatr · July 4, 2018, 2:05pm

Really interesting results from the GL b1300 performance drop for all the encryption, less than 10Mbits/s bottleneck is somewhere else for download.

[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-15.64  sec   384 MBytes   206 Mbits/sec  0.000 ms  0/335241 (0%)  sender
[  5]   0.00-15.64  sec  0.00 Bytes  0.00 bits/sec  0.000 ms  0/0 (0%)  receiver
iperf3: interrupt - the client has terminated
root@OpenWrt:~# iperf3 -c 158.69.212.79 -u -b 1000m -R -t 60 -l 1200
Connecting to host 158.69.212.79, port 5201
Reverse mode, remote host 158.69.212.79 is sending
[  5] local 192.168.1.71 port 39650 connected to 158.69.212.79 port 5201
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-1.00   sec  11.0 MBytes  91.9 Mbits/sec  0.041 ms  91455/101031 (91%)
[  5]   1.00-2.00   sec  10.2 MBytes  85.8 Mbits/sec  0.022 ms  91663/100604 (91%)
[  5]   2.00-3.00   sec  11.2 MBytes  94.0 Mbits/sec  0.102 ms  98275/108069 (91%)
[  5]   3.00-4.00   sec  11.3 MBytes  95.0 Mbits/sec  0.045 ms  94289/104186 (91%)
[  5]   4.00-5.00   sec  11.1 MBytes  93.0 Mbits/sec  0.078 ms  90369/100057 (90%)
[  5]   5.00-6.00   sec  10.8 MBytes  91.0 Mbits/sec  0.091 ms  95056/104534 (91%)
[  5]   6.00-7.00   sec  11.3 MBytes  95.2 Mbits/sec  0.133 ms  96068/105983 (91%)
[  5]   7.00-8.00   sec  11.0 MBytes  91.9 Mbits/sec  0.031 ms  96621/106193 (91%)
[  5]   8.00-9.00   sec  10.9 MBytes  91.8 Mbits/sec  0.083 ms  91489/101055 (91%)
[  5]   9.00-10.00  sec  10.9 MBytes  91.5 Mbits/sec  0.093 ms  97531/107064 (91%)
[  5]  10.00-11.00  sec  11.0 MBytes  92.3 Mbits/sec  0.093 ms  88498/98116 (90%)
[  5]  11.00-12.00  sec  11.2 MBytes  93.6 Mbits/sec  0.066 ms  99977/109732 (91%)
^C[  5]  12.00-12.38  sec  4.12 MBytes  91.6 Mbits/sec  0.102 ms  31792/35391 (90%)
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-12.38  sec  0.00 Bytes  0.00 bits/sec  0.000 ms  0/0 (0%)  sender
[SUM]  0.0-12.4 sec  13 datagrams received out-of-order
[  5]   0.00-12.38  sec   136 MBytes  92.2 Mbits/sec  0.102 ms  1163083/1282015 (91%)  receiver
iperf3: interrupt - the client has terminated
root@OpenWrt:~# iperf3 -c 172.168.0.254 -u -b 1000m -R -t 60 -l 1200
Connecting to host 172.168.0.254, port 5201
Reverse mode, remote host 172.168.0.254 is sending
[  5] local 172.168.0.140 port 55339 connected to 172.168.0.254 port 5201
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-1.00   sec  9.73 MBytes  81.6 Mbits/sec  0.124 ms  44901/53406 (84%)
[  5]   1.00-2.00   sec  10.3 MBytes  86.5 Mbits/sec  0.185 ms  58232/67245 (87%)
[  5]   2.00-3.00   sec  9.89 MBytes  82.9 Mbits/sec  0.371 ms  58477/67118 (87%)
[  5]   3.00-4.00   sec  9.66 MBytes  81.1 Mbits/sec  0.178 ms  48944/57387 (85%)
[  5]   4.00-5.00   sec  10.4 MBytes  87.3 Mbits/sec  0.138 ms  56329/65422 (86%)
[  5]   5.00-6.00   sec  10.0 MBytes  84.1 Mbits/sec  0.160 ms  53281/62043 (86%)
[  5]   6.00-7.00   sec  10.1 MBytes  84.7 Mbits/sec  0.320 ms  53879/62697 (86%)
[  5]   7.00-8.00   sec  10.4 MBytes  87.5 Mbits/sec  0.192 ms  51842/60959 (85%)
[  5]   8.00-9.00   sec  10.0 MBytes  84.0 Mbits/sec  0.184 ms  56444/65195 (87%)
[  5]   9.00-10.00  sec  10.5 MBytes  88.1 Mbits/sec  0.141 ms  50124/59306 (85%)
[  5]  10.00-11.00  sec  10.1 MBytes  84.5 Mbits/sec  0.112 ms  55761/64553 (86%)
^C[  5]  11.00-11.44  sec  4.69 MBytes  88.6 Mbits/sec  0.116 ms  24366/28464 (86%)
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-11.44  sec  0.00 Bytes  0.00 bits/sec  0.000 ms  0/0 (0%)  sender
[SUM]  0.0-11.4 sec  18 datagrams received out-of-order
[  5]   0.00-11.44  sec   116 MBytes  84.9 Mbits/sec  0.116 ms  612580/713795 (86%)  receiver
iperf3: interrupt - the client has terminated

Upload on the other hand… is nearly 50%

root@OpenWrt:~# iperf3 -c 172.168.0.254 -u -b 1000m -t 60 -l 1200
Connecting to host 172.168.0.254, port 5201
[  5] local 172.168.0.140 port 37707 connected to 172.168.0.254 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec  12.0 MBytes   101 Mbits/sec  10525
[  5]   1.00-2.00   sec  11.8 MBytes  99.1 Mbits/sec  10328
[  5]   2.00-3.00   sec  11.3 MBytes  94.7 Mbits/sec  9863
[  5]   3.00-4.00   sec  10.7 MBytes  90.0 Mbits/sec  9376
[  5]   4.00-5.00   sec  11.6 MBytes  97.7 Mbits/sec  10176
[  5]   5.00-6.00   sec  11.3 MBytes  95.0 Mbits/sec  9894
[  5]   6.00-7.00   sec  11.5 MBytes  96.8 Mbits/sec  10079
^C[  5]   7.00-7.28   sec  3.21 MBytes  95.8 Mbits/sec  2809
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-7.28   sec  83.6 MBytes  96.3 Mbits/sec  0.000 ms  0/73050 (0%)  sender
[  5]   0.00-7.28   sec  0.00 Bytes  0.00 bits/sec  0.000 ms  0/0 (0%)  receiver
iperf3: interrupt - the client has terminated
root@OpenWrt:~# iperf3 -c 158.69.212.79 -u -b 1000m  -t 60 -l 1200
Connecting to host 158.69.212.79, port 5201
[  5] local 192.168.1.71 port 60770 connected to 158.69.212.79 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec  24.4 MBytes   204 Mbits/sec  21297
[  5]   1.00-2.00   sec  22.3 MBytes   187 Mbits/sec  19514
[  5]   2.00-3.00   sec  21.0 MBytes   176 Mbits/sec  18371
[  5]   3.00-4.00   sec  21.4 MBytes   180 Mbits/sec  18741
[  5]   4.00-5.00   sec  21.3 MBytes   179 Mbits/sec  18628
[  5]   5.00-6.00   sec  21.2 MBytes   178 Mbits/sec  18543
[  5]   6.00-7.00   sec  21.2 MBytes   178 Mbits/sec  18526
[  5]   7.00-8.00   sec  19.5 MBytes   163 Mbits/sec  16998
^C[  5]   8.00-8.48   sec  8.53 MBytes   149 Mbits/sec  7455
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-8.48   sec   181 MBytes   179 Mbits/sec  0.000 ms  0/158073 (0%)  sender
[  5]   0.00-8.48   sec  0.00 Bytes  0.00 bits/sec  0.000 ms  0/0 (0%)  receiver
iperf3: interrupt - the client has terminated

TCP upload takes a bigger hit

root@OpenWrt:~# iperf3 -c 172.168.0.254 -u -b 1000m -t 60 -l 1200
Connecting to host 172.168.0.254, port 5201
[  5] local 172.168.0.140 port 37707 connected to 172.168.0.254 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec  12.0 MBytes   101 Mbits/sec  10525
[  5]   1.00-2.00   sec  11.8 MBytes  99.1 Mbits/sec  10328
[  5]   2.00-3.00   sec  11.3 MBytes  94.7 Mbits/sec  9863
[  5]   3.00-4.00   sec  10.7 MBytes  90.0 Mbits/sec  9376
[  5]   4.00-5.00   sec  11.6 MBytes  97.7 Mbits/sec  10176
[  5]   5.00-6.00   sec  11.3 MBytes  95.0 Mbits/sec  9894
[  5]   6.00-7.00   sec  11.5 MBytes  96.8 Mbits/sec  10079
^C[  5]   7.00-7.28   sec  3.21 MBytes  95.8 Mbits/sec  2809
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-7.28   sec  83.6 MBytes  96.3 Mbits/sec  0.000 ms  0/73050 (0%)  sender
[  5]   0.00-7.28   sec  0.00 Bytes  0.00 bits/sec  0.000 ms  0/0 (0%)  receiver
iperf3: interrupt - the client has terminated
root@OpenWrt:~# iperf3 -c 158.69.212.79 -u -b 1000m  -t 60 -l 1200
Connecting to host 158.69.212.79, port 5201
[  5] local 192.168.1.71 port 60770 connected to 158.69.212.79 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec  24.4 MBytes   204 Mbits/sec  21297
[  5]   1.00-2.00   sec  22.3 MBytes   187 Mbits/sec  19514
[  5]   2.00-3.00   sec  21.0 MBytes   176 Mbits/sec  18371
[  5]   3.00-4.00   sec  21.4 MBytes   180 Mbits/sec  18741
[  5]   4.00-5.00   sec  21.3 MBytes   179 Mbits/sec  18628
[  5]   5.00-6.00   sec  21.2 MBytes   178 Mbits/sec  18543
[  5]   6.00-7.00   sec  21.2 MBytes   178 Mbits/sec  18526
[  5]   7.00-8.00   sec  19.5 MBytes   163 Mbits/sec  16998
^C[  5]   8.00-8.48   sec  8.53 MBytes   149 Mbits/sec  7455
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-8.48   sec   181 MBytes   179 Mbits/sec  0.000 ms  0/158073 (0%)  sender
[  5]   0.00-8.48   sec  0.00 Bytes  0.00 bits/sec  0.000 ms  0/0 (0%)  receiver
iperf3: interrupt - the client has terminated

tcp download takes a nice big hit

root@OpenWrt:~# iperf3 -c 172.168.0.254 -R -b 1000m  -t 60 -l 1200
Connecting to host 172.168.0.254, port 5201
Reverse mode, remote host 172.168.0.254 is sending
[  5] local 172.168.0.140 port 60340 connected to 172.168.0.254 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   647 KBytes  5.29 Mbits/sec
[  5]   1.00-2.01   sec   982 KBytes  8.01 Mbits/sec
[  5]   2.01-3.01   sec  1.04 MBytes  8.75 Mbits/sec
[  5]   3.01-4.01   sec  1.30 MBytes  10.9 Mbits/sec
[  5]   4.01-5.01   sec  1.42 MBytes  11.9 Mbits/sec
[  5]   5.01-6.01   sec  1.64 MBytes  13.8 Mbits/sec
[  5]   6.01-7.00   sec  2.28 MBytes  19.2 Mbits/sec
[  5]   7.00-8.00   sec  4.11 MBytes  34.5 Mbits/sec
[  5]   8.00-9.01   sec  3.85 MBytes  32.1 Mbits/sec
[  5]   9.01-10.00  sec  3.63 MBytes  30.5 Mbits/sec
^C[  5]  10.00-10.72  sec  2.63 MBytes  30.7 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.72  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.72  sec  23.5 MBytes  18.4 Mbits/sec                  receiver
iperf3: interrupt - the client has terminated
root@OpenWrt:~# iperf3 -c 158.69.212.79 -R -b 1000m  -t 60 -l 1200
Connecting to host 158.69.212.79, port 5201
Reverse mode, remote host 158.69.212.79 is sending
[  5] local 192.168.1.71 port 47402 connected to 158.69.212.79 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  8.77 MBytes  73.6 Mbits/sec
[  5]   1.00-2.00   sec  11.7 MBytes  98.5 Mbits/sec
[  5]   2.00-3.00   sec  11.7 MBytes  97.8 Mbits/sec
[  5]   3.00-4.00   sec  11.7 MBytes  98.1 Mbits/sec
[  5]   4.00-5.00   sec  11.7 MBytes  97.8 Mbits/sec
[  5]   5.00-6.00   sec  11.7 MBytes  97.9 Mbits/sec
[  5]   6.00-7.00   sec  11.6 MBytes  97.1 Mbits/sec
[  5]   7.00-8.00   sec  11.8 MBytes  98.8 Mbits/sec
[  5]   8.00-9.00   sec  11.7 MBytes  97.8 Mbits/sec
[  5]   9.00-10.00  sec  11.5 MBytes  96.1 Mbits/sec
^C[  5]  10.00-10.40  sec  4.36 MBytes  91.9 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.40  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.40  sec   118 MBytes  95.2 Mbits/sec                  receiver
iperf3: interrupt - the client has terminated

Perhaps the most interesting result out of all of this is that Speedtest.net gets like 85mbps talk about good tcp auto tuning.

jkilpatr · July 4, 2018, 2:48pm

Now this is really interesting, see that little block over in the starting corner? that’s all the decryption work, you see the big stack on the right, that’s our iptables rules.

jkilpatr · July 4, 2018, 8:09pm

But now this one totally contradicts that, is Arm Neon that much more efficient? But in general it seems we’re not restricted by the performance of these kworkers at all, so do we even care? It seems that LEDE’s lack of hardware offload in most cases causes most of our performance problems. Our entire encryption stack is usually only 5-10% overhead compared to raw LEDE. Guess it’s my bad for not getting a good baseline.

Jehan · July 4, 2018, 9:57pm

It seems that the more flame graphs we print, the better our performance gets

jkilpatr · July 4, 2018, 10:26pm

that last one is the sysirq thread, so there’s no wireguard there.

The middle one is the n600 and I’m confused as to why wireguard dominates there versus ip6 tables, iptables is there but is much much faster by comparison. Maybe the arm Neon implementation of wireguard is just that much faster? Maybe my random selection of kthreads caused sample bias?

jkilpatr · July 5, 2018, 1:55pm

managed to monkey patch in Qualcom Fastpath offload suppport on the n750, no change from the 30mbps tcp perf we normally see (strangely udp perf sucks no matter what, in a reversal of the norm). Rumor on the forums is that you can get an erx to go from 100/100 shaped nat speed to 500/500 with fastpath support, but I can’t find a simple patch like I could for the n750 overall I don’t think it’s worth the current time investment. I’ll come back to it when we’re really needing the speedboost. Maybe then I can take the time to upstream it proper.

dtaht · October 13, 2018, 4:32pm

iperf udp fragments by default. try without fragments.

jkilpatr · October 19, 2018, 5:02pm

I explicitly set the packet size with -l for UDP