Introduction:
Imagine that you are in a bank counter trying to exchange your cheque. Let's say there are 4 counters.
In this situation which counter will you choose?
It is quite no-brainer that you will go to the counter that is serving least number of people. i.e. you are selecting the counter that is least loaded.
Why?
Because you can get your request served quickest there. i.e. the time between the request(you getting the token) and the response (cashier handing you the cash) is the least.
And, off course, you would never go to the counter which is closed.
Load Balancers do something similar.
They stand between client and servers, and they efficiently distribute the incoming client requests among the group of available backend servers.
They ensure that all the available servers are used equally.
For example, Load Balancers route the client request to the least busy server ( similar to you choosing the counter with smallest queue)
In another scenario, LB bypasses the server that is down and reroutes request to a up and running server.
Why are load balancers needed?
Availability and Scalability:
Sticking to the bank anecdote, if you see that one of the counters has no cashier, you simply choose the counter that has one.
Similarly, if some of the servers are down in the server farm, load balancer automatically detects it, and reroutes the traffic to functioning servers (Availability).
This gives us an ability to temporarily shutdown some servers to upgrade and downgrade them.(Scalability)
Going by the anecdote, you close the counter whose cash counting machine needs fixing and re-route customers to another counter.
Performance:
Load balancers route the traffic to least busy server or the one that has more capacity to handle the traffic.
How does load balancing work?
There are two types of load balancing algorithms:
STATIC ALGORITHMS:
Round Robin:
Here, the requests are assigned to the servers in sequential manner.
Weighted Round Robin:
Here, the traffic is distributed based on the priority and capacity of the server. i.e. a server with high capacity or priority with receive more requests than the rest of the servers.
For example, more customers will be routed to the cash counter whose cashier is more efficient/quicker in processing the customer request.
Hash method:
Here, the load balancer calculates hash of the client ip address and based on the computed value, the request is assigned to an appropriate server.
DYNAMIC ALGORITHMS:
Least connection:
Load balancer assigns the request to the server that has the least active connections. This is dynamic algorithm because the number of active connections per server is subject to change with time.
Least response time:
Load balancer assigns the request to the server which is quickest in processing the client request.