Adam Crosby home

Python on AWS Elastic Beanstalk

25 Aug 2012 - Norfolk

AWS Supports Python (and Flask) on Elastic Beanstalk

Amazon recently announced the availability of Python (WSGI-based) support on their Elastic Beanstalk PaaS. In the article, they specifically referenced Flask, my favorite web framework.

There is a great overview of a very simple Flask app posted on the Amazon documentation site. This overview takes you through the process of installing Git and the eb tool, used to control the Elastic Beanstalk environments from the command line.

Copy/Paste is Good, but Not Enough

So, following that overview exactly will leave you with a working, and a (functionless) Flask app, running on an AWS t1.micro instance.

If you’re looking to deploy an existing Flask app to Elastic Beanstalk, there are a few gotchas to watch out for, that are not covered in the overview (as of yet.)

Gotcha #1 - File and Objectname Requirements

For whatever reason, the default AWS configuration for Elastic Beanstalk configures the WSGI environment (it’s Apache + WSGI in the backend) to launch a WSGI object called application inside a source filed named application.py. This means that, contrary to virtually all other Flask tutorials, which use a WSGI object named app inside a script of your choosing (which you reference appropriately in the WSGI config for the web server), to take advantage of the default configurations, you must named your script application.py, and your Flask object within must be application.

For example, in the Flask tutorial, the following code is in a file named hello.py:

from flask import Flask
app = Flask(__name__)

@app.route("/")     
def hello():         
	return "Hello World!"

if __name__ == "__main__":         
	app.run() 

This flask application will fail to start on Elastic Beanstalk without some advanced configuration changes (I’ll talk to that later). To use that code as your Flask app, you’d have to change the filename to application.py, and the code to this:

from flask import Flask
application = Flask(__name__)  # Change assignment here

@application.route("/")        # Change your route statements
def hello():         
	return "Hello World!"

if __name__ == "__main__":         
	application.run() 		   # Change all other references to 'app'

Or, you can be lazy, like I am, and simply assign app to be a reference to application like this:

from flask import Flask
application = Flask(__name__)
app = application

@app.route("/")     
def hello():         
	return "Hello World!"

if __name__ == "__main__":         
	app.run() 

This has the benefit of not requiring you to replace every reference to app in your codebase.

A 3rd way to accomplish (at least part of) this is to use the .ebextensions/python.config file that’s covered in the next 2 gotchas, and setting an option called WSGIPath. The documentation for this says it’s the pointer to the WSGI script (so you don’t have to be application.py anymore), but mentions it still must have an ‘application’ callable. I assume this means that you must still use application as the Flask WSGI object name within the file. This is covered in painfully little detail in the AWS docs here.

Gotcha #2 - Static Files

The Flask tutorials cover static files pretty earlier on, and teach you how to use the url_for() function to generate pointers to static content such as JavaScript, CSS files, or images that you don’t generally want to pipeline through the WSGI process.

These all break when migrating an existing application to Elastic Beanstalk, as, like the WSGI application name, you don’t get access to the web server configurations to change them. What you must do instead, is utilize a Yaml configuration file in a hidden directory to set a customization for Elastic Beanstalk.

To get the traditional /static/ url to function for your static content on Elastic Beanstealk, you must make the following changes (this assumes an otherwise working, configured Elastic Beanstalk environment):

  1. Create a directory called .ebextensions in the root of your Git repository, and change directories into it:

    $mkdir .ebextensions
    $cd .ebextensions
    
  2. Use your favorite text editor to create a file named python.config with the following Yaml content:

option_settings:
  "aws:elasticbeanstalk:container:python:staticfiles":
    "/static/": "static/"

(please note - Yaml requires spaces, not tabs - tab characters will cause the parser to error out)

  1. Stage and commit the changes to your Git repository:

    $git add .
    $git commit -m 'Adding python.config for static content mapping'
    [master (root-commit) 7d30bf3] Adding python.config for static content mapping.
     1 files changed, 3 insertions(+), 0 deletions(-)
     create mode 100644 .ebextensions/python.config
    
  2. Push the new deployment to Elastic Beanstalk:

    $git aws.push
    Counting objects: 7, done.
    Delta compression using up to 8 threads.
    Compressing objects: 100% (3/3), done.
    Writing objects: 100% (4/4), 422 bytes, done.
    Total 4 (delta 1), reused 0 (delta 0)
    remote: 
    To https://snip@git.elasticbeanstalk.us-east-1.amazonaws.com/repos/snip-env
       aaf2428..03d8596  HEAD -> master
    
  3. Profit, as

 url_for('static', filename='css/main.css') 

will now function as expected!

Gotcha #3 - Filesystem Access and ‘Asset’ Staging

This gotcha might be my fault - the documentation here is pretty hard to follow (or just doesn’t exist). With that said, the gotcha is that there doesn’t seem to be a set way to have filesystem based assets that are NOT in the web-root that your code access - e.g. a SQLite database file.

Hopefully this is just a documentation shortcoming on a brand new product. Until the documentation is updated (or a kind reader points me to the right place!), I’ve developed a workaround based on an unexplained example from the AWS documentation.

If you refer to Step 2 above, and the .ebextensions/python.config file, you’ll see the first part of the answer. That file can contain a number of other entries to customize your environment. AWS is a huge fan of ‘bootstrapping’ instances, so that auto-scaling and other advanced features can be fully utilized. Per-machine state is heavily frowned upon.

The python container customization documentation is pretty sparse, but shows a tantalizing snippet of what you can do.
The packages: directive lets you specify what yum packages you need installed. This is useful if you need any kind of special support library, beyond what’s available through pip freeze for python packages. The bit I was most interested in, however, is the commands: section of the file, which is shown demo’ing Django database administrivia.

My approach to this gotcha was to use the commands: section of the file to reference a Bash script I wrote, copying files out of my Git repository, and into known locations. While it doesn’t appear to be documented, it seems that Elastic Beanstalk will execute the scripts listed as if they were in an rc.d, so 01my_script will execute before 99last_one.sh.

To keep things simple, I followed the examples lead, and created a scripts directory in my Git repository, and put a Bash script there. The Bash script is executed from the root of the Git repository, so it should reference everything via a relative path.

My final .ebextension/python.config file looks like this:

commands:
  01prepenvironment:
    command: "scripts/01prepenvironment.sh"
option_settings:
  "aws:elasticbeanstalk:container:python:staticfiles":
    "/static/": "static/"

My scripts/01prepenvironment.sh script simply copies my SQLite database file into /tmp (chosen because it was simple - I know I have read/write).

Important

It’s critical to understand that the Elastic Beanstalk images are all ephemeral, in the ‘old’ style of AWS AMIs. This means that nothing on an instances filesystem will survive through a deployment, redeployment, or stoppage of the environment/instance. My SQLite database file was just read-only content stored in a structured way - the Flask app doesn’t need to actually modify any of the data. If you do need a datastore that you can modify (and need those modifications to persist), check out one of the other PaaS offerings (RDS, SimpleDB, Dynamo, etc.), or run your own datastore on another machine (e.g. an AWS instance that is EBS backed).