Wednesday, October 27, 2010

Updating AppEngine Kinds without taking your site down

I recently had the need to change one of my datastore Kinds. In the past when I have done this, I have shutdown the site and migrated every entity of the kind to the new version of the Kind. However my application is getting to the point that it takes a long time to do that and I didn't want the site down that long. After reading several articles on the subject, most notably this one:, and discussing the matter in the #appengine irc channel on freenet, I had a plan.

The first thing I did was add a new property to my Kind called version, next I added an __init__ constructor for the class. I had previously thought you were not supposed to do that with App Engine, but Nick Johnson on the irc channel said as long as you are careful it would be the right thing to use in this case.

The new constructor has a version check, and if it's the old version the upgrade on the entity is performed. The super call to the base constructor must be called first.

class NTrain(db.Model):
    version = db.IntegerProperty()
    def __init__(self, *args, **kwargs):
        super(NTrain, self).__init__(*args, **kwargs)
        #Since this is the first time I am using version just do a boolean check
#otherwise you would want if self.version < 2: or whatever you version number is.
        if not self.version:
            self.version = 1
            #do upgrade

Next I changed my code to use the new version of the entities.

Then I set up the code that creates all new instances of the Kind to use the new version number.

Finally I will (I haven't done this part yet) run a mapper over the entire kind that just has to load the entity and save it.

That last step is technically not necessary, but by doing that I will guarantee that all entities of the kind have been upgraded to the new version, and once that has been done I can remove the version checking code from the constructor.

You do not have to use a version number to decide that an upgrade is necessary. You could also use an implicit method of upgrading, such as checking if a new property exists on the entity in the datastore and if not then run the upgrade. For my use case I decided that the explicit check would be simpler and more robust.

Tuesday, September 7, 2010

Datastore lessons learned the hard way

With the latest release of the Railroad Empire game I have reduced my App Engine CPU usage by about 66%. Looking back on why my CPU usage was so high before, I have come to the conclusion that when I originally designed my application I made some poor datastore design decisions and fixing those design flaws drastically decreased my CPU usage. So below I am detailing some of the important lessons I have learned.

Use Entity Groups to intelligently partition your data
According to the App Engine documentation you should only use Entity Groups for transactions.
 Only use entity groups when they are needed for transactions. For other relationships between entities, use ReferenceProperty properties and Key values, which can be used in queries.
I have discovered that you may also want to use Entity Groups in partitioning your data, for example placing all the data for a player in a game in an Entity Group with the player as the parent. The documentation alludes to doing this later in the same page.
A good rule of thumb for entity groups is that they should be about the size of a single user's worth of data or smaller.
The problem is that the way the documentation reads is that you should only do that if you are actively using transactions on that data. First I have found that you may start out thinking that you won't be doing transactions and then you find that you do need to do transactions on the data. Second, you can use your parent relationship in conjunction with your Key Names for establishing unique keys.

Make intelligent use of your Key Names.
This is the big one. No matter what you do, don't just go and use a generated key for every one of your entities. Whenever it makes sense use a key name that uniquely identifies the entity using data within the entity. You can often use entity groups to help create intelligent unique keys.

Here's why this is important, queries on key names are so much faster then queries on full entities. If you have a screen that displays a list of your entities that a user can select one of to get a deeper view of that entity, if you do a full query on your list screen you are loading every single entity into memory just to get the information necessary to click on it. Sometimes this can't be avoided, and when it can't you can use Memcache to cache the list for you, but if at all possible only display the information in the key name.

The other place that intelligent use of Key Names can help you, and where it helped me the most was to have my key names match some static lookup data that I used specifically in my maps. In my application I use Google Maps for displaying the Cities, stations, and routes of your railroad. There is a fixed set of cities and routes you can choose from that are all saved in an XML file. Before I made the most recent change you had to load each station entity to display its icon on the map because the city the station referred to was stored as a property in the entity.

Since you can only have one station per city I changed the datastore so that the key name for the station was the city being referenced. In order for this to work I had to set the Station in an Entity Group with the player as the parent. Now whenever I want all of a players stations, but I only care about what city the station is in, which is about 95% of the time, I don't have to load the large bulky station object.

There is another way you could accomplish this same goal. You could de-normalize your data and store a list of the players stations in the player object. This approach will work, but be wary it has several drawbacks. You can have issues with index bloat, as well as entity size issues. remember each entity can only have 1000 index rows each and an entity can only be 1MB in size.

How to fix your ailing datastore
Luckily if you have these sorts of woes in your datastore, or need to refactor it for any reason, all is not lost. There are several ways to move data around in the datastore without too much interference. There is a great project for doing the map part of a map-reduce over here:
For datastore refactoring you only need the map part of map reduce. Beware that sharding of the data can be problematic in the current implemntation.

Another good alternative is to do the following, which ends up being a poor mans map. Have an admin only url that launches a task (we'll call it the launcher task) to iterate over your data(keys only) you want to change using a cursor. Have the launcher task launch a new task for each entity you want to change to do the actual work of changing that entity (the worker task). Have the launcher task execute a reasonable number of worker tasks (I used 100) and then add itself back to the queue with its cursor as a parameter and then end. Now the launcher task will iterate over your entire query launching tasks to do the real work. The worker tasks can now make each individual datastore change.

Be sure to stop your cron jobs and other task queues while you are updating your datastore.

I hope that some of these lessons I have learned help other people.

Monday, August 16, 2010

Solving transaction contention issues in Ranklist

This weekend I begin seeing the following message on my dashboard:

08-13 08:21PM 44.950 Transaction collision for entity group with key datastore_types.Key.from_path(u'ranker', 23150L, _app=u'railroadempire'). Retrying...

After a little digging I discovered that the transaction that gets run by ranklist is wrapped around the entire ranker. In Google App Engine terms that means that each individual ranker is it's own entity group and the transaction is on the ranker's entity group. What this ends up meaning is that you can only write one rank into your ranklist at a time.

My game has picked up that there are enough people playing that more than one update to the ranking was trying to be processed at a time. This was causing the transactions to need to be retried and adding significant weight to my CPU usage.

After a quick post on the ranklist Google group I got a great response from Bartholomew Furrow:

I'm only guessing, but probably the best way to deal with this is batch your updates.  Whenever a scoreboard row changes in score, set a 'dirty' field in it to True; then, separately, have a cron job continually querying for rows with 'dirty' set to true and then running set_scores on the ranker for all of them.  That way you get the advantage of having a lot of updates at the same time, and during periods of heavy use you just fall behind a bit instead of failing requests.

Beyond that I think you'd need to get into specifics of the tree structure and whether certain nodes happen to be getting hit more often than the others.  For example, if you're calling num_ranked_users (or whatever it's called) a lot, you could consider slapping a memcache on that with an expiry time of 20s so your root node doesn't get hit nearly as often.
Well I chose the batch method that he mentioned above, and here is what I did:

  1. I added a rankChanged Boolean Property to the player entity. 
  2. Whenever the player's cashOnHand changes I set the rankChanged to true.
  3. I removed the old individual calls to setRanking in ranklist
  4. I set up a Cron Job to run every minute syncronized in the cron.yaml file
  5. I added a request handler to do the work:
def get(self):
        playerscores = {}
        count = 0
        playerquery = model.Player.all()
        playerquery.filter("rankChange =", True)
        ranker = model.Player.getRanker()

        for player in playerquery.fetch(50):
            playerscores[str(player.key())] = [player.cashOnHand]
            player.rankChange = False
            count += 1
        logging.debug("Set Rank for %s players" % count)

This solved the problem nicely, but not perfect. Now the rankings are no longer real-time they are delayed about a minute. I had to set the fetch size to 50 because after that you get timeouts and memory issues. Currently I am only using about 38 of the 50 in a request. I should add to this function so that if the count is 50 I launch a Task Queue request to run this same code, that way if there are more then 50 it won't back up until the next Cron job runs.

Hope this helps someone else.

Thanks Bartholomew Furrow for the help.

Sunday, August 1, 2010

Developing Games on Google App Engine

Yesterday at the Cleveland Game Developer Meetup Tech Workshop I gave a talk on using Google App Engine for games. I discussed many aspects and gave a brief overview of Google App Engine for those not familiar with it. In the talk I told people about the tools I used to create Railroad Empire and I presented the code for a simple demonstration multi-player text adventure.

You can get the code for the sample from here:

You can try it out here: