Cloud Load Balancers - Primer    Posted:


Warning

Under Construction

As with any technology there are times when unexpected problems can occur. The problems encountered in the cloud may appear complex however I hope to show how simple the service really is.

Terminology

Load Balancer
A load balancer is a logical device which belongs to a cloud account. It is used to distribute workloads between multiple back-end systems or services, based on the criteria defined as part of its configuration.
VIP
A virtual IP is an Internet Protocol (IP) address configured on the load balancer for use by clients connecting to a service that is load balanced. Incoming connections are distributed to back-end nodes based on the configuration of the load balancer.
Source Network
The IP range that traffic will be sent from the Cloud Load Balancing service. The specific source IP of the CLB can be obtained via the API.
Node
A node is a back-end device providing a service on a specified IP and port.

(reference: CLB API Glossary)

Network Topology

So the basic topology of the Cloud Load Balancers service as far as we are concerned is described in this simple image.

http://images.cdn.rackspace.com/cloud/cloud-computing-products/loadbalancers/graphic-technology.png

(reference: Cloud Load Balancers Product Page)

Lets break down the topology to get a better understanding of how traffic flows between the Internet, the load balancer and backend nodes.

e.g.

Internet <--> VIP [Load Balancer] Source Network <--> Nodes

In this example:

Internet <--> 206.10.10.210 [Load Balancer] 10.189.254.0/23 <--> 10.176.2.19

Connectivity

Note

Initially going to include command output however I suggest trying the commands yourself and see what the output looks like for your environment.

There are a few ways to monitor traffic on the node. Good old tcpdump and ngrep. While tcpdump is good and available on most systems, I recommend trying out ngrep, especially when it comes to quickly getting info out of HTTP requests.

Let's jump right in and begin! Make a request to a site being load balanced, while using tcpdump/ngrep on the backend node.

Method A: tcpdump

Brief flags overview. Please reference the tcpdump man pages for further information.

-s Capture first 1024 bytes of request. Useful since we're looking at the headers in this case.
-A Print packet in ASCII. Needed since we're interested in the headers and if needed for viewing content of the site.
-q Quiet/Quick output.
-p Don't put the interface into promiscuous mode
-n Don't resolve remote ports to names. The first "n" for remote ports and the second "n" for local ports.
-i The interface we're dumping traffic on.

e.g.

tcpdump -s 1024 -Aqpnni eth1 port 80

Notice the output is not very human friendly but it will get the job done.

Method B: ngrep

Brief flags overview. Please reference the ngrep man pages for further information.

-S Look at first 1024 bytes of packet.
-q Be quiet.
-p Don't put the interface into promiscuous mode
-W Display packets in specified mode. The byline mode honors embedded linefeeds, wrapping text only when a linefeed is encountered (useful for observing HTTP transactions, for instance).
-d The interface we're dumping traffic on.
-i Ignore case for regex expression.

e.g.

ngrep -S 1024 -qpW byline -d eth1 -i 'x-forwarded-for|x-forwarded-proto' port 80

This output is considerably easier to view and follow requests.

That's About It

I thought there would be more to discuss but in reality it either works or it doesn't, (pushing them packets back ... and ... forth). Also check out the Cloud Load Balancers - The Missing Manual post for more information. I guess now I can move on to um... other topics on Scalable Technology, (aka "the cloud"...).

Comments

Cloud Load Balancers - The Missing Manual    Posted:


The Rackspace Cloud Load Balancers service is pretty simple and for the most part "just works". In using the service however I have found a few items that are not obvious, (e.g. 500 Internal Server Error but nodes are working?), or not given enough attention, (e.g. Node Service Events).

A few things we're going to review.

Health Monitoring/5XX Status

So what happens if all or a few of your nodes in the LB pool are failing? A 500 error, that's what! This of course is a bad thing...

For example, you may see the load balancer give this error:

../galleries/clb_service_unavailable.png

The 500 Internal Server Error [1] means that:

A generic error message, given when an unexpected condition was encountered and no more specific message is suitable.
[1]Wikipedia - List of HTTP status codes

With Cloud Load Balancers this means:

  • Node health monitoring is not enabled for the cloud load balancer.
  • A node or nodes are failing to respond to requests coming from the CLB service.

As a result of no health monitoring, failing nodes will continue to remain in rotation until either the nodes recover or fail completely.

So what can you do?

  • Enable health monitoring, (more info: here and here).
  • Verify the nodes are working as expected, (e.g. the node is online, service running, etc).

There are a few things that could trip you up as well in regard to health monitoring. A few being:

  • Unsupported body regex. Meaning the CLB's regex matching is pretty simple, so don't try to be too fancy with response pattern matching.
  • The pattern match has to be within the first 2048 bytes of the response. Thus if you're attempting to match a pattern at the bottom of a complex page, the patten won't get matched and the node gets marked as offline. Remember, keep it simple.
  • Unset "host" header. This usually will affect HTTP servers that require the "host" header be set when making a request, (e.g. "host: www.example.com"). For instance, you may find yourself enabling the health monitor and nodes are marked offline immediately. If this happens, update the health monitors configuration and set the "hostHeader" option, (more on this in the Using The CLB API section).
  • Not allowing access to the CLB services source IP addresses. For a list of internal subnets used, see this Rackspace KB.

Update Node Logging

Now that traffic is coming from the load balancer, the nodes logs will be have the source IP of the load balancer. Meaning that rather than seeing the IP of the end-user/client, the IP of the load balancer will be logged. The following provides information on logging the X-Forwarded-For HTTP header that has the end-users IP. The X-Forward-Proto HTTP header may be of some interest as well.

Add Node Hostname Header

This is optional however I recommend it, especially when you're trying to figure out what node in the load balancing pool is having a problem.

Using The CLB API

The Cloud Load Balancers API provides a lot of functionality that isn't provided via the MyCloud Panel. A few we're going to look at are:

API Authentication

The flow for working with any of the Rackspace API's is:

  • Authenticate
  • Token is returned
  • Use token to interact with whatever service

As I mentioned, before moving forward we will need to authenticate to the Rackspace Identity service in order to get a token to use for the API calls. See this Rackspace KB on locating API credentials.

AUTH=https://identity.api.rackspacecloud.com
curl -s -X POST $AUTH/v2.0/tokens -d '{ "auth": "RAX-KSKEY:apiKeyCredentials":{ "username":"USERNAME", "apiKey":"APIKEY" } } }' -H "content-type: application/json" | python -m json.tool

From the output reponse, take note of the following:

  • publicURL
  • token id

For example, for if you're using Cloud Load Balancers in Virgina:

Service Catalog

"serviceCatalog": [
    {
    "publicURL": "https://iad.loadbalancers.api.rackspacecloud.com/v1.0/000042",
    "region": "IAD",
    "tenantId": "000042"
    },
]

Token

"token": {
    "RAX-AUTH:authenticatedBy": [
        "APIKEY"
            ],
    "expires": "2012-04-13T13:15:00.000-05:00",
    "id": "aaaaa-bbbbb-ccccc-dddd"
    "tenant": {
        "id": "000042",
        "name": "000042"
    }
    },

Now that we have the needed information, let's move forward and gather some data from the Cloud Load Balancers API.

List Load Balancers

Before we can actually do anything with our cloud load balancer, we have to have its instance ID. This can be obtained from the MyCloud Panel or from the API. For example, using the information we obtained when we authenticated to the Identity service, get a list of load balancers in IAD, (Virginia).

From the example output reponse, take note of the following:

  • name (e.g. "name": "lb-site1")
  • id (e.g. "id": 71)
ENDPOINT=https://iad.loadbalancers.api.rackspacecloud.com/v1.0/000042
TOKEN=aaaaa-bbbbb-ccccc-dddd
curl -s -H "x-auth-token: $TOKEN" $ENDPOINT/loadbalancers | python -m json.tool

{
"loadBalancers":[
    {
        "name":"lb-site1",
        "id":71,
        "protocol":"HTTP",
        "port":80,
        "algorithm":"RANDOM",
        "status":"ACTIVE",
        "nodeCount":3,
        "virtualIps":[
            {
                "id":403,
                "address":"206.55.130.1",
                "type":"PUBLIC",
                "ipVersion":"IPV4"
            }
        ],
        "created":{
            "time":"2010-11-30T03:23:42Z"
        },
        "updated":{
            "time":"2010-11-30T03:23:44Z"
        }
        },
    ]
}

Load Balancer Statistics

Load balancer stats provide a brief overview of how the nodes in the pool are performing. From the documentation:

  • connectTimeOut – Connections closed by this load balancer because the 'connect_timeout' interval was exceeded.
  • connectError – Number of transaction or protocol errors in this load balancer.
  • connectFailure – Number of connection failures in this load balancer.
  • dataTimedOut – Connections closed by this load balancer because the 'timeout' interval was exceeded.
  • keepAliveTimedOut – Connections closed by this load balancer because the 'keepalive_timeout' interval was exceeded.
  • maxConn – Maximum number of simultaneous TCP connections this load balancer has processed at any one time.
curl -s -H "x-auth-token: $TOKEN" $ENDPOINT/loadbalancers/71/stats | python -m json.tool

{
"connectTimeOut":10,
"connectError":20,
"connectFailure":30,
"dataTimedOut":40,
"keepAliveTimedOut":50,
"maxConn":60
}

Node Service Events

Note

Health monitoring must be enabled in order to capture node events!

Node events can be extremely useful when diagnosing outages experienced with the nodes.

For example:

curl -s -H "x-auth-token: $TOKEN" $ENDPOINT/loadbalancers/71/nodes/events | python -m json.tool

   {
    "nodeServiceEvents": [
        {
            "accountId": 000042,
            "author": "Rackspace Cloud",
            "category": "UPDATE",
            "created": "12-12-2013 23:07:01",
            "description": "Node '373' status changed to 'OFFLINE' for load balancer '71'",
            "detailedMessage": "Write failed: No route to host",
            "id": 95901,
            "loadbalancerId": 71,
            "nodeId": 373,
            "relativeUri": "/000042/loadbalancers/71/nodes/373/events",
            "severity": "INFO",
            "title": "Node Status Updated",
            "type": "UPDATE_NODE"
        },
        {
            "accountId": 000042,
            "author": "Rackspace Cloud",
            "category": "UPDATE",
            "created": "01-02-2014 21:40:18",
            "description": "Node '373' status changed to 'OFFLINE' for load balancer '71'",
            "detailedMessage": "Timeout while waiting for valid server response",
            "id": 125649,
            "loadbalancerId": 71,
            "nodeId": 373,
            "relativeUri": "/000042/loadbalancers/71/nodes/373/events",
            "severity": "INFO",
            "title": "Node Status Updated",
            "type": "UPDATE_NODE"
        },
        {
            "accountId": 000042,
            "author": "Rackspace Cloud",
            "category": "UPDATE",
            "created": "01-02-2014 21:48:44",
            "description": "Node '373' status changed to 'ONLINE' for load balancer '71'",
            "detailedMessage": "Node is working",
            "id": 125675,
            "loadbalancerId": 71,
            "nodeId": 373,
            "relativeUri": "/000042/loadbalancers/71/nodes/373/events",
            "severity": "INFO",
            "title": "Node Status Updated",
            "type": "UPDATE_NODE"
        },
        {
            "accountId": 000042,
            "author": "Rackspace Cloud",
            "category": "UPDATE",
            "created": "01-19-2014 23:11:40",
            "description": "Node '373' status changed to 'OFFLINE' for load balancer '71'",
            "detailedMessage": "Write failed: Connection refused",
            "id": 139491,
            "loadbalancerId": 71,
            "nodeId": 373,
            "relativeUri": "/000042/loadbalancers/71/nodes/373/events",
            "severity": "INFO",
            "title": "Node Status Updated",
            "type": "UPDATE_NODE"
        },
        {
            "accountId": 000042,
            "author": "Rackspace Cloud",
            "category": "UPDATE",
            "created": "01-19-2014 23:11:52",
            "description": "Node '373' status changed to 'ONLINE' for load balancer '71'",
            "detailedMessage": "Node is working",
            "id": 139497,
            "loadbalancerId": 71,
            "nodeId": 373,
            "relativeUri": "/000042/loadbalancers/71/nodes/373/events",
            "severity": "INFO",
            "title": "Node Status Updated",
            "type": "UPDATE_NODE"
        }
    ]
}

Updating Load Balancer Configuration

There are a few configurational items that are not exposed via the MyCloud Panel. I found the following useful in certain cases such as:

  • IIS and/or strictly configured HTTPD services will give a unexpected response when a request is made directly to the IP address of the server.
  • HTTPS only load balancing if the service will not use HTTP.

Set hostHeader for Health Monitoring

The easiest method to udpate a Health Monitor is to create it via the MyCloud Panel, pull it from the API, then update the response payload to have the hostHeader value.

Pull Health Monitoring Configuration

curl -s -H "x-auth-token: $TOKEN" $ENDPOINT/loadbalancers/71/healthmonitor | python -m json.tool

 {
     "healthMonitor": {
         "attemptsBeforeDeactivation": 2,
         "delay": 10,
         "path": "/",
         "statusRegex": "^[234][0-9][0-9]$",
         "timeout": 5,
         "type": "HTTP"
     }
 }

Update Health Monitoring Configuration

curl -s -X PUT -H "content-type: application/json" -H "x-auth-token: $TOKEN" $ENDPOINT/loadbalancers/71/healthmonitor -d '{"healthMonitor": { "attemptsBeforeDeactivation": 2, "delay": 10, "hostHeader": "www.virtualdisaster.net", "path": "/", "statusRegex": "^[234][0-9][0-9]$", "timeout": 5, "type": "HTTP" }}' -i

HTTP/1.1 202 Accepted

Updated Health Monitor

curl -s -H "x-auth-token: $TOKEN" $ENDPOINT/loadbalancers/71/healthmonitor | python -m json.tool

 {
 "healthMonitor": {
     "attemptsBeforeDeactivation": 2,
     "delay": 10,
     "hostHeader": "www.virtualdisaster.net",
     "path": "/",
     "statusRegex": "^[234][0-9][0-9]$",
     "timeout": 5,
     "type": "HTTP"
     }
 }

HTTPS only load balancers

If you plan on only offering HTTPS enabled services, the load balancing service has functionality to 301 redirect HTTP requests to HTTPS.

Enable httpsRedirect on the load balancer.

curl -s -X PUT -H "content-type: application/json" -H "x-auth-token: $TOKEN" $ENDPOINT/loadbalancers/71 -d '{ "loadBalancer": { "httpsRedirect": true } }' | python -m json.tool

HTTP/1.1 202 Accepted

HTTPS redirect enabled load balancer (output truncated for brevity)

curl -s -H "x-auth-token: $TOKEN" $ENDPOINT/loadbalancers/71 | python -m json.tool

{
"loadBalancer":{
"name":"lb-site1",
"algorithm": "RANDOM",
"protocol": "HTTP",
"port": 80,
"timeout": 60,
"connectionLogging": true,
"httpsRedirect": true
 }
}

Primer

Initially this was going to be a "troubleshooting" guide however it is more of a quick reference on practical use of the service. Check it out here.

Comments

Share