Putting Google’s Network to work for You

Putting Google’s Network to work for You

Hi, I’m Simon Newton,
a software engineer on Google’s Cloud
Networking team. In this talk, I’ll cover
the two different types of load balancing available
on Google Compute Engine. This will demonstrate
how you can now take advantage of
Google’s technology to improve the performance
of your own web applications. At Google, we spent the last 15
years building infrastructure to make our websites
faster and more reliable. Today, we use these systems
to serve YouTube, Maps, Gmail, Search, and many other products. But first, let’s go
back in time to 1999, and take a look at the
evolution of Google’s frontend serving infrastructure. In the beginning, Google’s
serving infrastructure was very simple, since
we only had a single data center in California. We had a small number of
web servers configured behind a hardware load balancer. The hardware load balancer
had a virtual IP address, abbreviated as a VIP, which
was provided to clients by one of our four DNS servers. There was very little
redundancy in the system. So if there was ever a
problem with the data center, Google.com would be
down for everyone. In 2000, as more people
came to rely on Google, we added additional data
centers on the East Coast of the United States. With these data centers,
there was a second set of web servers and another
hardware load balancer. When clients resolved
www.google.com, they were randomly sent to
one of the two locations. This offered up some redundancy,
but increased latency for many of our users. Half the time, clients
on the West Coast were sent to the other
side of the country. To improve the
situation, we switched to geolocating DNS servers. The DNS servers would use the
IP address of the clients DNS resolver to determine
which VIP to return. This meant that in most
cases, user requests was sent to the
closest data center. However, sometimes
during peak times, the closest data center
had insufficient capacity, so DNS servers would
send the clients to the next closest data center. Overflowing requests in this
way was bad for our users, because it meant we
weren’t serving them from the optimal location,
and so their latency was higher than normal. At Google, we refer
to this method of balancing requests using
DNS as frontend load balancing. By this time, Google was
offering more products than just web search, and we
needed a way to direct requests based on their URL. We introduced a new system
called the Google Front End or GFE. The GFE accepts the
client’s TCP connection and inspects the host
header and URL path to determine which
backend service should handle the request. Backend services are
groups of servers that can handle a
particular class of traffic. For example, requests
to maps on google.com will be sent to
our maps servers, while requests to
news.google.com will be sent to
our news servers. The GFE is also responsible for
health checking their backends. If a backend server fails
to respond to health checks, the GFEs will stop sending
traffic to the failed backend. By carefully tuning the GFE
health checking parameters, we were able to upgrade the
kernels or binaries of the GFE backends without disrupting
any user requests. The GFEs also maintain
persistent TCP connection to their backends,
so the connections are ready to use as soon
as the request arrives. This also helps us reduce
latency for our users. At this point, we
required the ability to serve users from a
different data center to the ones the GFEs
were running in. During maintenance windows
or failure scenarios, we want to continue terminating
the client’s TCP connections at the local GFE, but then
use the backend service in an adjacent data center. To support this, we introduced
another load balancing system called the Global Software
Load Balancer or GSLB. This system allows us
to set per data center capacity for each
backend service. When the rate of incoming
requests for service exceeds the local
capacity, requests overflow to the next
closest location. And we call this
backend load balancing. As Google continued to expand,
we added new data centers in Europe, Asia,
and South America. By locating GFEs
closer to the users, we could reduce
the round trip time between the user and the GFE. This reduce the
TCP handshake time, and in turn, further decreased
the latency for our users. As Google continued
to grow, we found that off-the-shelf hardware
load balancers no longer met our performance and
scalability requirements. So we replaced all of the load
balancers with our own custom built solution called MagLev. This enabled us to experiment
with new load balancing algorithms and increase their
liability of the entire system. Finally, DNS-based load
balancing is far from perfect, since it can only load
balance at the granularity of a single DNS resolver. If two or more groups of
users are in different regions but behind the same DNS
resolver, some of them may be sent to the
wrong location. We often see this situation
with open resolvers. For example, if one group
of users is in India and the other is in Japan,
and both groups share the same resolver, we may end
up sending them all to Taiwan, rather than sending
each group of users to their best location. There are also some
resolvers and clients that do not want to know
the DNS record TTLs. Even with a short
five minute TTL, it can take a while for a load
to shift from one location to another. This makes us slow to
respond to outages, and leaves a long tail of
clients stuck to a single VIP. To avoid this, we’ve developed
a method of sending clients to the closest data
center without relying on DNS geolocation. This means we can
use a single VIP, and still give our customers
the low latency they’ve come to expect from Google. Google’s global
load balancer knows where the clients are located
and directs the packets to the closest location. Using a single VIP means we
can increase the TTL of our DNS records which further
reduces latency. You can now take advantage
of this same infrastructure with Google Cloud Platform. We offer two types of load
balancing, network and HTTP. Network Load Balancing,
launched in 2013, swaps traffic over with
VMs within a region. It doesn’t want
modify the packets, so VM instances can use the
source IP address of the client if required. Network Load Balancing can be
used with both TCP and UDP. The second and latest offering
is HTTP Load Balancing. HTTP Load Balancing offers
a single global IP address. It uses the latest techniques
described in this video to send requests
to the closest VM instance with
available capacity. This means DNS-based load
balancing is not required, so we avoid the issues with
split resolver populations. HTTP Load Balancing uses
Google’s global deployment footprint by connecting
clients to the closest GFE. And taking advantage of
persistent connections to the VMs, we can reduce the
latency of web applications even if the VMs are
located in a single region. Advanced customers can also
take advantage of URL maps with a HTTP Load Balancing. Just as Google uses
different backend services for different products,
URL maps allow customers to send
portions of the URL space to different groups of VMs. For example, you may want
to serve static files from a different
set of VM instances than the dynamic content. HTTP load balancing
allows you to do this. Hopefully, this
gives you a glimpse of how Google front end
infrastructure has evolved and how you can
take advantage of it to improve the performance
and reliability of own applications. So whether you’re a large
enterprise or a new start up, we invite you to benefit from
using Google’s Cloud Platform.

Author: Kevin Mason

5 thoughts on “Putting Google’s Network to work for You

  1. I/O Bytes: Putting Google's Network to Work for You
    /by @Simon Newton  #iobytes   #cloud   #cloudplatform  

    In this I/O Byte, learn how to leverage the massive scale and performance of Google's global network infastructure.

  2. I am just curious. Does the GFE provide any backend affinity? For example, does it consistently hash traffic to the same backend based on some parameters? Or does each request go to random backend.

Leave a Reply

Your email address will not be published. Required fields are marked *