Abstract
Server load balancing has
become an important function both for hosting
environments and a variety of other network
architectures. This paper introduces the concepts
behind server load balancing and explains
in detail how Numaria implements server load
balancing.
Load balancing was developed
to address the problem of overloaded servers.
A Server Load Balancer (SLB) is placed between
the client and a group of servers and configured
so that, although multiple servers may be
on one side of the SLB, they appear to be
one very large and powerful server that never
goes down. The SLB takes on the IP address
that the client is trying to contact, becoming
a Virtual Server that directs the client to
one of the servers in the load balanced group
of servers. The SLB may be a dedicated device
that uses software to perform all traffic
management decisions or it may be a multi-port
switching device, such as Numaria offers,
that uses a hardware to perform these functions
allowing for greater performance.
This paper highlights the
three areas that must be addressed to implement
a successful load-balanced environment: the
load balancing algorithm, the method for checking
server availability, and the method of ensuring
that client requests are directed to the same
server when required. We discuss these issues
and the solutions that Numaria delivers.
Load Balancing Algorithms
The first issue that must
be addressed is how traffic is balanced between
servers. As a client comes in, the SLB must
determine which server to connect the client
to for the session. The goal is to allocate
sessions in a relatively equal fashion so
that no single server receives an inordinate
amount of traffic, leaving other servers idle.
The simplest algorithm is
called "round-robin." When a request
arrives in this mode, it is sent to the next
real server in the pool of servers. This process
assumes two things: that all servers are equal
in power, and that all requests require the
same amount of effort for the server to fulfill.
When servers of different performance levels
are used, then a straight round robin algorithm
would cause the slower servers to have the
same load as the faster servers, not a good
balance. The solution to having different
levels of performance in the machines in the
server pool is to implement a weighted round
robin algorithm, where a fractional weight
is given to each server and sessions are assigned
using these ratios. While this solves the
inequality of session allocation, not all
sessions generate the same load. The number
of sessions may be equal, but each of the
individual sessions could generate vastly
different server loads. Some may be intense
database queries or high bandwidth streaming
media sessions, while others may be just minimal
text downloads or a small gif file. If a disproportionate
amount of one type of session goes to a particular
server, that server may bog down, resulting
in poor response time or lost sessions, while
other servers in the load balanced pool remain
underutilized.
Server Load Balancing,

One solution to this problem
is to try to measure server utilization. This
is done by either load balancing based on
the number of open sessions a server has -
which is known as the "least load"
algorithm - or by keeping track of the response
time of each server and balancing based on
fastest response. Other options include keeping
a ranking of the combination of fastest response
and least load, or tracking this information
over time and ranking it based on changes
in increasing or decreasing values, and using
these rankings for selecting which server
a client session is assigned. Some devices
allow for load balancing based on the URL.
Besides the obvious limitations of working
only for HTTP traffic, there are concerns
about the delayed binding and the maintenance
of the URL to the server-binding table. Each
new session requires an increasing amount
of software to perform properly. As traffic
increases and requires additional software
features in the SLB device, the SLB can become
the bottleneck. The power and the number of
servers, along with the amount of network
traffic, all come into play in optimizing
the system's performance levels when selecting
the load-balancing algorithm.
One final part of SLB algorithm
selection is ensuring availability. One feature
an SLB must have to perform at top levels
is a Maximum Session Threshold per server.
This allows the system administrator to select
how many sessions at most will be assigned
to a given server. Once the SLB has reached
its maximum, no new sessions are assigned
until the number of sessions goes below the
maximum threshold. The threshold control feature
ensures that a server won't receive too many
sessions and become overloaded. It also allows
two SLB devices to run in either an Active-Passive
mode where one device is running in a standby
mode for the other, or in an Active-Active
mode where both devices are acting as backups
for each other and both are load balancing
sessions to the same server by using different
real port numbers on the servers, without
danger of overload. If either SLB were to
fail, the other would have enough bandwidth
available to pick up the slack without causing
a cascade of failures resulting in the site
being down and lost revenue.
Session Persistence
The load-balancing algorithm
spreads the load and risk across multiple
servers; each flow from a client is processed
by the algorithm and assigned accordingly.
But problems abound. For example, downloading
a Web page, entering information, loading
a shopping cart, and purchasing items are
all considered to be part of one session for
a client. But for an SLB, these are considered
to be tens or hundreds of individual sessions
or flows. A Web page consists of many elements
or objects, each of which is requested separately.
Filling a shopping cart is done by viewing
multiple Web pages and entering data where
desired. Making a purchase requires moving
from HTTP to a secure SSL mode then back again.
In addition, the shopping cart information
usually is stored on the same server as the
SSL session. Without session persistence,
the SLB would see all these flows as distinct
events to be load balanced, and the shopping
cart information would be scattered over the
pool of servers.
The solution is to send the
client to the same server each time. In an
ideal world, this would be accomplished by
looking at the client's IP address, matching
it to previously assigned flows and sending
the client to the same server or using the
load-balancing algorithm of choice to assign
the client to a server. Client-to-server bindings
should have a timeout feature that enables
a client to visit other sites and still return
and connect to the same server, without being
assigned to an entirely new server and losing
previously entered data.
Most sites mix applications,
using HTTP for Web pages, SSL for secure transactions,
and an audio or video engine for media streaming.
Because each of these sessions uses different
port numbers, each is considered by an SLB
to be a distinct session. With Sticky Ports,
however, the SSL session will be assigned
the same server as the HTTP session. Assigning
it to the same server is accomplished by enabling
the feature during installation of the virtual
server. The intelligence of the software allows
for selecting a configuration associating
multiple application port numbers together.
When a new session arrives, the SLB looks
to see if a session binding to a real server
exists for the client IP address and the virtual
server IP and port number combination, or
any of the other virtual server port numbers
in the sticky port grouping. If a binding
already exists between the client and a server,
then the new session is sent to the same server.
If there is no current binding, then the load
balancing algorithm selects to which server
the client session should be sent.
Another issue that must be
addressed is when a client goes through a
proxy server. Whether as a security precaution
or as a way to save public IP address numbers,
some proxy servers make all traffic coming
from the network they are serving appear to
be originating from the same IP address. This
is done using a technique known as Network
Address Translation (NAT). It is possible
that a client may use one IP address for HTTP
traffic and another for the SSL (or other
port) traffic. The SLB would see this as traffic
coming from two different clients and potentially
assign the supposed clients to different servers,
causing shopping cart data to be unavailable
for the checkout application. This problem
is solved using one of two techniques: delayed
binding or Intrinsic Persistence Checking.
In a delayed binding mode,
the SLB actually initiates a TCP session with
each new flow request. The client thinks it's
talking to the end server and starts to send
data to the SLB, which reads the first packet
of information and looks for client-specific
information. In an HTTP mode, the SLB looks
for "cookies" that it or one of
the servers has inserted. In an SSL mode,
by comparison, the SLB looks at the SSL session
ID. In either case, the SLB compares this
information with its stored table of server
bindings and picks the real server to which
the client should go. The SLB then initiates
a session with the server, looking like the
client, and connects the two together. This
is an extremely software-intensive process
that puts a limit on the throughput of the
SLB and currently works only with SSL or HTTP
sessions. In addition, the Sticky Port feature
must be running to ensure that the SSL and
HTTP traffic goes to the same server.
Numaria's SLB utilize real-time
Intrinsic Persistence Checking in conjunction
with the Sticky Port feature. Instead of using
extrinsic information contained in the data
payload, the SLB uses the intrinsic information
in the packet header to know where to send
client sessions. Traffic to an SLB comes from
all over the Internet, meaning that, on average,
traffic is coming from hundreds or even thousands
of different proxy servers. Each of these
proxies uses contiguous pools of IP addresses
to assign to clients as they access the Internet.
We know that a client will
be given an address from a specific range.
We also know that these ranges of addresses
never overlap. We can use this intrinsic information
contained in every packet header to make a
real-time, accurate decision as to which server
to connect the client. This Intrinsic Persistence
Checking is accomplished by applying a netmask
to the client IP address and comparing the
result to existing client/server bindings.
If one exists already, then the client is
sent to the same server; otherwise, the selected
SLB algorithm will choose the server.
Comprehensive Server Checking
The last key element for
implementing a successful SLB environment
is comprehensive server checking designed
to verify that the server is up and functioning
properly. Aliveness checking entails more
than pinging the device and waiting for a
response, as the server could very well be
up but the application that is servicing customer
requests could be down. The applications and
databases on the server, the connections to
a backend database server, and the ability
of that server to supply data must all be
checked to guarantee that customers receive
the highest levels of service.
We addresses these requirements
through Comprehensive Server Checking, where
the SLB sends each server a request to execute
a CGI script. The CGI script checks the server
side applications and ability to talk to backend
databases. If everything is working properly,
the CGI script sends an "OK" message
to the SLB; otherwise, it either sends a failure
message or times out. If the server still
fails to respond properly after a pre-selected
number of tries, it is taken out of the queue
and no new sessions are sent to it. The truly
unique feature here is that, regardless of
what kind of application is being load-balanced
on the server, administrators can configure
CGI scripts to verify that the server is up
and running.
Comprehensive Server Checking
ensures that all the paths a client request
can take are checked and working. If a device
only sent a ping or an open and close on a
port, it would only check if a port or stack
was working, but not if the application was
actually working and the database is available.
If a device uses an external server, it is
merely checking the path the server has to
the servers under test - and not the path
that the client would take. Either case can
potentially lead to a server being marked
as up and running, when it is actually performing
improperly for the client.
Reliability
The final piece to the puzzle
is SLB reliability. The Hitless Protection
system provides primary defense against control
module failures, by negating the need for
a reboot on a control module failure. In addition,
two SLB devices can be deployed so that if
one has a dual control module failure the
second is available to process client requests.
This is done using VRRP . With VRRP, the backup
device monitors the primary device and takes
over processing traffic if a failure occurs.
Stateful fail-over, finally, adds a third
layer of protection. As bindings between clients
and servers are created and cleared, the information
is sent to the backup device, which has all
the information required to pass traffic from
clients to the appropriate server if a failure
should occur.

By any measure, server load
balancing entails much more than simply redirecting
client traffic to multiple servers. In order
to implement it correctly, the SLB device
must have features such as Intrinsic Persistence
Checking, Comprehensive Server Checking, and
Stateful Fail-over Redundancy. All these capabilities
need to scale as the volume of network traffic
increases so that they do not become bottlenecks
or potential points of failure. In response
to these needs, the Load Balancing Switch
provides the performance and scalability that
network administrators need to maintain optimal
performance, no matter how large the network,
Numaria provides its customers with enabling
service provider infrastructure that yields
increased network reliability, improved performance,
and enhanced services.