I’m trying to improve my TCP throughput over a “high delay network” between Linux machines.
tcp_rmem to “8192 7061504 7061504”.
wmem_default to “7061504”.
txqueuelen to 10000.
tcp_congestion_control to “scalable”.
I’m using “nist” (cnistnet) to simulate a delay of 100ms, and the BW I reach is about 200mbps (without delay I reach about 790mbps).
I’m using iperf to perform the tests and TCPTrace to analyze the results, and here is what I got:
On the receiver side:
max win adv: 5294720 bytes
avg win adv: 5273959 bytes
sack pkts sent: 0
On the sender side:
actual data bytes: 3085179704
rexmt data bytes: 9018144
max owin: 5294577 bytes
avg owin: 3317125 bytes
RTT min: 19.2 ms
RTT max: 218.2 ms
RTT avg: 98.0 ms
Why do I reach only 200mbps? I suspect the “owin” has something to do with it, but I’m not sure (these results are of a test of 2 minute. A 1 minutes test had an “avg owin” of 1552900)…
Am I wrong to expect the throughput to be almost 790mbps even if the delay is 100ms?
(I tried using bigger numbers in the window configurations but it didn't seem to have an effect)
mentions that as Linux nowadays autotunes TCP settings, messing with the values will likely not improve things.
That being said, maybe 100 ms together with a large bandwidth (at least 790 mbps) might lead to an enormous BDP, so maybe the autotuning decides that something is wrong and doesn't go far enough..
Try setting the iperf window size to really mach the bandwidth-delay-product of that link. So avg. RTT * 1Gbps should give you 10MB roughly. See if that improves things.
The only way you can really start to understand what is going on is to get more data -- else you are just guessing or asking other people to guess. I recommend getting a system level view (cpu, memory, interrupts etc) with
sar from the
iostat package. Also, you should get a packet dump with Wireshark or tcpdump. You can then use Wireshark to analyze it as it has a lot of tools for this. You can graph the window size over time, packet loss, etc.
Even a little packet loss on a high latency link tends to hurt bandwidth quite a bit. Alhough being simulated -- this is a bit strange. Lots of small packets might also cause high interrupts (even though those might be simulated as well?).
So in short, get TCPDump and Sar to see what is going on at the packet level and with your system resources.
This is a common TCP issue called "Long Fat Pipe". If you Google that phrase and TCP you'll find a lot of information on this problem and possible solutions.
This thread has a bunch of calculations and suggestions on tuning the Linux TCP stack for this sort of thing.
How much memory does this machine have? The
tcp_mem settings seems to be insane, it configured 28gb (7061504 * 4kb) for TCP data globally. (But this is not your perf problem since you most likely do not hit that limit on a few-socket test run. Just wanted to mention it since setting tcp_mem to tcp_xmem values shows a very common missconception).
The 7mb you have configured for default seems ok. The maximum however can go up much higher on large delay pipes. For testing I would use 64MB as the max number for
tcp_rmem, then you can rule out that this is your limiting factor. (This does bloat your buffers, so it only works if you have limited concurrency and the connection has low jitter and drops).