Overview
When running performance benchmarks and comparing results between a direct connection to Amazon RDS & connecting through ScaleArc to RDS, you may observe reduced throughput while connecting through ScaleArc.
Solution
Diagnosis
This would be because only one CPU is tasked to handle the network interrupts causing the network requests to bottleneck at this one CPU. This can be verified by running cat /proc/interrupts
which will produce an output as shown below.
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15 CPU16 CPU17 CPU18 CPU19 CPU20 CPU21 CPU22 CPU23 CPU24 CPU25 CPU26 CPU27 CPU28 CPU29 CPU30 CPU31 CPU32 CPU33 CPU34 CPU35
25: 11386561 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth0-Tx-Rx-0
26: 12290463 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth0-Tx-Rx-1
27: 12636327 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth0-Tx-Rx-2
28: 10177783 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth0-Tx-Rx-3
29: 12500283 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth0-Tx-Rx-4
30: 12193507 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth0-Tx-Rx-5
31: 13161420 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth0-Tx-Rx-6
32: 13479516 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth0-Tx-Rx-7
33: 227665 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The above output shows the network-related interrupts. Note that all counter numbers on cpus other than CPU0 are zero. More information about parsing the above output can be found at on the RedHat portal.
Additionally running top
would reveal that ksoftirqd/0 taking up almost 100% of CPU, where the 0 signifies the core number.
Steps To Fix
This situation can be solved by enabling irqbalance
, starting it with the following command systemctl irqbalance start
. Once started check that its running with systemctl irqbalance status
. This will balance the IRQs every 10 seconds. Running the same command to list /proc/interrupts
as above after enabling irqbalance
produces the following output which signifies that now all available CPUs are enabled to handle interrupts (non-zero numbers emphasized).
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15 CPU16 CPU17 CPU18 CPU19 CPU20 CPU21 CPU22 CPU23 CPU24 CPU25 CPU26 CPU27 CPU28 CPU29 CPU30 CPU31 CPU32 CPU33 CPU34 CPU35
25: 11390046 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 336 0 0 0 0 0 0 PCI-MSI-edge eth0-Tx-Rx-0
26: 12293634 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 209 0 0 0 0 0 PCI-MSI-edge eth0-Tx-Rx-1
27: 12640281 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 499 0 0 0 0 PCI-MSI-edge eth0-Tx-Rx-2
28: 10180840 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 179 0 0 0 PCI-MSI-edge eth0-Tx-Rx-3
29: 12504005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 377 0 0 PCI-MSI-edge eth0-Tx-Rx-4
30: 12196786 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 334 0 PCI-MSI-edge eth0-Tx-Rx-5
31: 13164527 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 248 PCI-MSI-edge eth0-Tx-Rx-6
32: 13482947 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth0-Tx-Rx-7
33: 232455 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge nvme0q0, nvme0q1
irqbalance
, but enabling it will resolve this issue because it balances the IRQs. There are a few other measures that can be taken to improve performance such as increasing the number of cores/vcpus on the machine by increasing the instance size, creating multiple clusters pointing to the same RDS instance, and using an AWS Elastic Load Balancer in front of the SclaeArc instance to balance traffic to all the clusters.
Testing
Once the steps above are performed, check with the performance testing tool to observe improvements to the throughput.
Comments
0 comments
Please sign in to leave a comment.