Go back

I Bet You Didn't Know These Load Balancing Techniques

Load balancing is one of the critical pillars on which distributed systems thrive, but for most of us, it starts and ends with spinning up a node running NGINX and calling it a day.

And honestly, that works.

Until it doesn’t.

As systems grow, traffic becomes global, failures become more frequent, and latency starts to matter. At that point, traditional application-level load balancing begins to show its limits.

Thankfully, there are a lot of other fast and reliable load-balancing techniques hiding beneath the surface of the internet. Some of them power global CDNs, Kubernetes itself, and even the DNS system we all depend on.

Let’s walk through a few underrated load-balancing techniques that are worth keeping in mind as your systems scale.

Table of Contents

Open Table of Contents

Anycast Load Balancing (The Free Load Balancer)

Anycast has to be one of the most powerful, efficient, yet underused methods of load balancing.

It’s heavily used by companies like Google, Cloudflare, and public DNS resolvers across the internet, yet it’s rarely discussed outside infrastructure circles.

Instead of routing traffic to a single IP address in one location, the same IP address is advertised from multiple locations around the world. The internet’s routing system then automatically sends users to the nearest healthy node. This behavior is driven by Border Gateway Protocol, which decides how packets flow across the Internet.

What makes Anycast special is that it isn’t dependent on a single provider or a centralized load balancer. It’s backed by the internet itself. As a result, you get global load balancing without centralized control, automatic failover, and the lowest possible latency by default.

How do I make my own?

There are two practical ways to set up Anycast.

The first approach is to run Anycast yourself. For this, you need an Autonomous System Number (ASN) and a routable IP prefix that you’re allowed to announce from multiple locations. You deploy the same application on servers in different regions and use a BGP daemon such as BIRD or FRRouting to advertise the same IP address from each server. When a server or network link goes down, its route is withdrawn and traffic automatically shifts elsewhere, without any explicit failover logic.

The second, and far more common approach, is to consume Anycast through a cloud or edge provider. CDNs and global edge platforms already operate massive Anycast infrastructures, so you simply deploy your application behind their Anycast IPs. You still get global traffic steering and resilience, without managing BGP, ASNs, or IP ownership yourself. This is how most teams use Anycast in practice.

DNS-Based Load Balancing (Simple, Cheap, and Still Effective)

DNS is one of the earliest forms of load balancing, and it’s still surprisingly useful.

Instead of returning a single IP address, DNS can return multiple IPs, or dynamically choose one based on location, latency, or configured weights. While DNS operates outside the request path and is therefore slower to react, it provides a great deal of flexibility and scales extremely well.

Common techniques include:

  • Geo-based routing
  • Latency-based routing
  • Weighted round robin

This approach is still widely used in enterprise systems because it’s simple, inexpensive, and highly scalable. The tradeoff is that DNS responses are cached at many layers, so failover is eventual, not immediate. Changes take time to propagate, which is why outages so often end with the phrase: “It was the DNS.”

How do I make my own?

Setting up DNS-based load balancing is relatively straightforward. You configure your DNS provider to return different IP addresses for the same domain based on routing rules, and associate health checks so unhealthy endpoints are removed over time. Managed DNS platforms like Route 53 and Cloudflare provide built-in support for geo routing, latency routing, weighted records, and health checks. For smaller setups, even simple round-robin A records with a low TTL can act as a basic load-balancing layer.

Client-Side Load Balancing (No Load Balancer at All)

This one surprises a lot of people.

In client-side load balancing, there is no central load balancer. Instead, the client itself decides which backend to talk to. The client discovers all available servers, applies simple logic to pick one, and retries another server if the request fails.

A typical flow looks like this:

  1. The client fetches a list of healthy servers
  2. It picks one using hashing, round robin, or latency-based selection
  3. On failure, it retries a different server

This pattern is commonly seen in:

  • gRPC-based systems
  • Internal microservices
  • Service meshes

Service discovery systems like Consul or etcd often power this approach by continuously updating clients with healthy endpoints.

Why do teams use this?

Client-side load balancing removes the load balancer entirely, which means:

  • No central bottleneck
  • Fewer network hops
  • Better resilience under high load

The tradeoff is increased complexity in the client, which now needs logic for discovery, retries, and failure handling. Because of this, client-side load balancing is most commonly used for backend-to-backend communication, where clients are controlled and easier to update.

Kernel-Level Load Balancing (Ridiculously Fast, Rarely Mentioned)

Most people think load balancing happens in user space. But Linux can do it inside the kernel.

While we mostly care about Layer 7 load balancers that understand HTTP, headers, and paths, there’s an entire class of Layer 4 load balancers that operate purely at the network level. These work by rewriting IP addresses and ports and forwarding packets to backend servers, without ever looking at the application data.

The most common example of this is IPVS (IP Virtual Server). IPVS runs inside the Linux kernel and listens on a virtual IP address. When traffic arrives, it forwards the connection to one of many backend servers using simple but extremely fast algorithms like round robin, least connections, or hashing.

Because this happens in kernel space, there is:

  • No userspace proxy in the hot path
  • Very low latency
  • Extremely high throughput

This is one of the reasons Kubernetes networking scales so well. When kube-proxy runs in IPVS mode, Kubernetes Services are load balanced using IPVS under the hood, silently handling millions of connections with minimal overhead.

Why do teams use this?

Kernel-level load balancers are chosen when performance matters most:

  • Near-zero overhead packet forwarding
  • No central userspace bottleneck
  • Ideal for east–west traffic and internal services

The tradeoff is that Layer 4 load balancers are intentionally simple. They don’t understand HTTP, can’t terminate TLS, and can’t route based on URLs or headers. Because of this, kernel-level load balancing is usually combined with Layer 7 proxies rather than replacing them.

Final Thoughts

There is no single “best” load-balancing strategy.

Most large systems combine multiple techniques: DNS or Anycast at the edge, kernel-level load balancing for internal traffic, and application-level proxies where request-level control is needed.

The real takeaway is this: load balancing is not just a component, it’s a design choice. When you pick the right technique at the right layer, you often need less complexity overall, not more.

And sometimes, the best load balancer is the one you barely notice at all.