Since the NetScaler platform in many cases is setup as a reverse proxy, the Subnet IPs (or SNIPs) configured are used as the source address in the communication between NetScaler and Backend servers. This implies that you might run out of sourceports in communicating with the backend servers.
By default TCP communication is set up between a source-port on the requesting end and a destination-port on the receiving end. Both port references are 16bits long within a TCP packet, meaning there are 65,536 possible source ports available to be used for TCP communication.
It also means that based on this RFC one IP address cannot sustain more than 65,536 simultaneous client-connections.
Sourceports on the NetScaler
Since version 8.1 of the NetScaler platform, the handling of the source ports is done by the NetScaler platform. The effect is that the NetScaler can use 1 ip address to sustain 16 times 65,536 = 1,048,576 server connections. To make this possible, some remarks need to be made on this:
- The NetScaler cannot setup more than 65,536 server connections (per SNIP/MIP) to the same service (service group member)
TCP multiplexing dampens the number of server connections needed. However it relies on the backend server being able to answer a request quickly. If no answer is send back to the NetScaler, but a new request should be send, the NetScaler will setup a new server connection (in other words will occupy a new source port).
Typically under normal load of the backend servers the ratio between client-connections (users connecting to the NetScaler) and server-connections (the NetScaler connecting to the backend servers) will find a balance. Under heavy server-load this balance will be disturbed. Monitoring this ratio can therefore be an early warning signal when monitoring the NetScaler platform.
How this translates to large scale implementations
(numbers and ratios mentioned here are dependant on the implementation)
With lots of users connecting to the NetScaler, each user normaly needs multiple TCP connections (say 10) to the platform. With 100,000 users connecting, the NetScaler will handle 1,000,000 inbound TCP connections; these connections are loadbalanced across say 10 backend servers, using 1 SNIP address.
Under “normal load”, the ratio client-connections/server-connections balances out at 5/1 (TCP Multiplexing efficiency). Leaving the NetScaler to maintain 200,000 server connections, or 20,000 connections per backend server. This will work.
Maintenance cycles, degraded backends
During backend maintenance, servers will reboot.
With one server rebooting:
- The NetScaler “looses” 65,536 sourceports to communicate with the backend servers.
- Existing sessions will be diverted to backend servers still available; load will increase on these machines, degrading TCP multiplexing efficiency.
(Sample) The TCP multiplexing efficiency drops to 4/1:
- The NetScaler needs to maintain 250,000 server connections, with 9 backend servers still available
- Resulting in 28,000 connections per backend server (increase of 50%!!!)
Worst case scenario
In a worse case, TCP Multiplexing efficiency drops to 1/1. With only 10 backend servers, this would mean 100,000 server connections per backendserver; this is more than the 65,536 sourceports available!!
Even with 16 backend servers and a ratio 1/1, we would still need 62,500 sourceports per backend server. More then 16 backend servers would only be helpfull for the NetScaler, if it would cause the TCP Multiplexing ratio to change.
With this ratio degraded to 1/1, more serious things can be happening on the backend; it might be that the backend cannot handle all the requests comming in. Monitoring might start failing (timed out); more servers are taken offline, spiraling the setup into a panic mode where on average only part of the servers have alternating availability.
Use IP-Sets in situations like these (Obvious conclusion, I know). Each additional IP address added as a source address for a service(group) adds 65,536 * 10 (number of backendservers <16) sourceports to be used.
Keep track of the TCP Multiplexing efficiency: a disturbance here can be a warning sign for degrading backend performance.