Monday, December 19, 2011

Setting up Queue-size-based auto scaling groups in AWS

One of AWS' coolest features is the ability to scale in and out according to custom criteria. It can be based on machine load, number of requests, and so forth. For the sake of this tutorial, we are going to focus on queue-size-based auto scaling, that is, once we have certain amount of messages in a queue, an alarm will go off and will trigger the auto scaling policy to go into effect; however, this feature can also be used in conjunction with some sort of load balancing mechanism. In the AWS ecosystem, we can accomplish this using some of their off-the-shelf services, namely SQS, CloudWatch, and AutoScaling. As of this writing, AutoScaling doesn't have console/UI access to AutoScaling groups or any of its underlying requirements, so we are going to use boto which is a great and easy to use Python-based AWS API library.

For all the steps that follow, please remember, as a general rule, AWS services are region-specific. That is to say, these services are only visible and usable within the region in which they were created. Also important to bear in mind is that each of these services have a cost associated to them as per AWS pricing (see each of the product pages above for pricing details).

First let's setup the queue we're going to use to post and receive messages from and base our auto scaling on. To do this, log into your AWS management console and head to the SQS tab and click on Create New Queue. A modal window will pop up like the one shown below:

SQS_STEP_1

A bit on the parameters of the queue creation:

Default Visibility Timeout: The number of seconds (up to 12 hours) your messages will remain invisible once the message has been delivered.

Message Retention Period: The time (up to 14 days) when your message will be automatically deleted if they aren't deleted by the receiver(s).

Maximum Message Size:  The maximum queue message size (in KB, up to 64).

Delay Delivery: The time (in seconds) that the queue will hold a message before it is delivered for the first time.

Our next steps will include setting up the auto scaling group (and all the underlying services) and then setting up CloudWatch to handle the monitoring and issuing of alarms.

So, assuming you have Python and boto already installed, we're going to create a script to do the heavy lifting for us. The way the API and auto scaling works in boto is as follows: first you need a Launch Configuration (LC). A Launch Configuration, as its name states, is metadata about what do you want to launch every time the alarm is triggered (i.e. which ami, security groups, kernel, userdata and so forth). Then you need an Auto Scaling Group (ASG). ASGs are the imaginary "containers" for you auto scaling instances and contain information about Availability Zones (AZ), LCs and group size parameters. Then, in order to actually do the scaling, you'll need at least one Scaling Policy (SP). SPs describe the desired scaling behavior of a group when certain criteria is met or an alarm is set off. The last piece of the puzzle is a CloudWatch alarm which I will address later.

So, back to our script. First, import the necessary modules:
from boto.ec2.autoscale import AutoScaleConnection, LaunchConfiguration, AutoScalingGroup
from boto.ec2.regioninfo import RegionInfo
from boto.ec2.autoscale.policy import AdjustmentType, MetricCollectionTypes, ScalingPolicy

As an aside, while in boto you can set your AWS credentials in a boto config file, I like having the credentials within the scripts themselves to make it more direct and explicit, but feel free to use boto config if that's what your preference.

First thing we need to do is to establish an auto scaling connection to our region of choice -- in this example, the Oregon region (aka us-west-2). To do so, we do as follows:
AWS_KEY = '[YOUR_AWS_KEY_ID_HERE]'
AWS_SECRET = '[YOUR_AWS_SECRET_KEY_HERE]'

reg = RegionInfo(name='us-west-2',  endpoint='autoscaling.us-west-2.amazonaws.com')
conn = AutoScaleConnection(AWS_KEY, AWS_SECRET,  region = reg,  debug = 0)

We then need to create the LC. In the code below I added many parameters for the sake of illustration, but not all of them are required by either AWS or boto. I believe that the only required fields are name and image_id. Bear in mind, though, that if you choose to use these optional parameters, they need to be accurate else you'll get an error in the create launch configuration API request.
lc = LaunchConfiguration(name="LC-name", image_id="ami-12345678",
instance_type="m1.large", key_name="Your-Key-Pair-Name", security_groups=['sg-12345678', 'sg-87654321'])
conn.create_launch_configuration(lc)

The next step is to setup the ASG. Choose your min and max size carefully, specially if your scenario will scale based on a queue that can be directly or indirectly DDoS attacked. While you wouldn't want your site to be unresponsive to your customers, you wouldn't want would-be attackers to scale you up a very hefty bill. So, as a good practice, set an upper bound to your scaling groups.
ag = AutoScalingGroup(group_name="your-sg-name",
availability_zones=['us-west-2a', 'us-west-2b'],
launch_config=conn.get_all_launch_configurations(names=['LC-name'])[0], min_size=0, max_size=10)
conn.create_auto_scaling_group(ag)

We are almost done with the auto scaling setup; however, without a way to trigger auto scaling, all is for naught. To this end AWS lets you set different scaling criteria in the form of Scaling Policies (SP). Any self-respecting AS scheme has some sort of symmetry, that is to say for every scale up, there's a scale down. If you don't have a scale down, chances are you won't be entirely happy with monthly bill and wasting resources/capacity. The way we set the ASPs with boto is as follows:
sp_up = ScalingPolicy(name='AS-UPSCALE', adjustment_type='ChangeInCapacity',
as_name='your-sg-name',scaling_adjustment=1, cooldown=30)
conn.create_scaling_policy(sp_up)

sp_down = ScalingPolicy(name='AS-DOWNSCALE', adjustment_type='ChangeInCapacity',
as_name='your-sg-name',scaling_adjustment=-1, cooldown=30)
conn.create_scaling_policy(sp_down)

Before I continue on, I will say that the whole topic of SPs is, as of this writing, sparsely covered in AWS' documentation. I found some general information, but nothing to the level of detail that is desired by most people trying to understand SPs and their nuance.

Alright, if everything thus far has gone according to plan, we should be ready to move on to the next step. For this part, we will use the AWS management console. We could, of course, do it via API but I like to use the console whenever possible. So, log into the management console and click on the CloudWatch tab and make sure you are working in the right region.

On the left navigation bar, click on Alarms. Then click on Create Alarm. A Create Alarm Wizard modal window will pop up. In the search field, next to the All Metrics dropdown, type "SQS". This will bring up the metrics associated with the queue we built at the beginning of this tutorial. For the sake of this exercise, click on NumberOfMessagesReceived (though you are welcome to try other options/metrics if you wish). After selecting the row, click Continue. Give it a name and description. In the threshold section set it to ">= 10 for 5 minutes". In the next step of the wizard, we are going to configure the actions to take once this creteria has been met. Set the "When Alarm State is" column to ALARM, set the "Take action" column to Auto Scaling Policy and finally set the "Action details" to the scaling group we just created. A new dropdown menu will appear where you choose which policy to apply (see this screnshot -- sorry but the image was too wide for this blog layout).  This will be our up-scale policy. To setup the down-scale step, on the last column of the configure actions step of the Create Alarm Wizard, click on "ADD ACTION". In this new row, select "OK" from the "When Alarm State is" dropdown menu, then, just as above, select "Auto Scaling Policy" from the "Take action" column dropdown menu,  in the "Action details" dropdown select your AS group, and select your downscaling policy from the policy dropdown. Click Continue. In the next step, check your metrics, alarms and actions are correct. Finally click on Create Alarm.

Now that we are completely done setting the auto scaling up, you might want to test it. The easiest way would be to send couple hundred messages to the queue via API/boto and see how it scales up. Then deleting the messages and seeing how it scales down, but that is something I might address in a later post.

Hope this tutorial was of help and easy to follow. For comments and suggestions, ping me via Twitter @WallOfFire





No comments:

Post a Comment