Wednesday, September 4, 2013

For Schema Migrations Use App Engine Task Queue instead of Backends

Every application under iterative development, eventually is going to need a schema migration of some sort. In our case, we needed key/id field re-alignment as well as adding new common fields to all our models that needed to be populated. Given that even the most innocuous of migrations will probably take longer than the maximum time allowed by an HTTP request, our inclination was to resort to a GAE backend to do the hard work in the background independent of the HTTP requests deadlines. The problem with GAE backends is two-fold: the setup overhead and the dismal documentation and examples that the App Engine team provides -- although, to be fair, the former is exacerbated by the latter. 

Backends have their own .yaml config file with several options that are not entirely clear and what side effect (if any) they have. The documentation for the backends.yaml is here and as you can see chasing down the different choices for a given option can feel like going down the rabbit hole. To make matters worse, you have to "wire" your app's app.yaml with backend.yaml and figure out how they differ and what is their role with respect to backends. Yet another issue that is not entirely clear is how to kick off a backend instance, assuming of course that you have all the pieces in the right place. The docs, again, are dismal, in my opinion. One passage mentions /_ah/start as the way to kick start your backends; however, the docs also mention that a 404 HTTP error code is considered a success. Since there isn't a way to see whether the backend is doing anything or not, is outright confusing to consider a 404 a success. Eventually, after much searching, I came across a way to get the process going via a front-end request; unfortunately this ties the backend to the fron-end deadlines.

As I was about to /flipdesk, I walked through the dark caverns of App Engine's news group, I bumped into a post about best practices for schema migrations. The post mentioned this article posted on Dec, 2012 that lo-and-behold precisely addresses the issues at hand. While it seemed trivial to copy-pasta, I knew this would not apply to our app since we use App Engine's nbd datastore abstraction module. Fortunately, someone put the time to write a ext.db -> ext.ndb cheatsheet that made matters a lot easier. After moving things around a bit and switching the syntax to ndb and adding couple simple lines to app.yaml, I fired up my local dev server and tried it. It worked from the get go. I pushed to our staging server on App Engine's infrastructure, which contains ~40K records. Fired the process up and some ten minutes later, the migration was complete. The best part, I could see its progress and logs thereof through the admin console even how deep the Task Queue was and what task was next in line.

In all, setting up, changing code and testing schema migration thought Task Queue took roughly 1/5th the time I had thus far invested on getting this process running under a backend. So, my recommendation based on my experience is to avoid using backends unless you absolutely have to and instead leverage Task Queue for all other asynchronous/long-running processes.

No comments:

Post a Comment