Monday, July 11, 2011

Writing data to Google Fusion Tables from App Engine

This weekend I was asked by some potential investors to supply some statistics for my game. The information was embedded in my datastore in the player and transaction objects. To extract the data, I decided to iterate over each player in the game. This could have been done with a map part of map/reduce, but I just wrote a simple task queue job to walk the data serially for simplicity since there wasn't a ton of data and speed wasn't a huge issue, the fifteen minutes it took was fine.

To access Fusion Tables I downloaded the Python library for Fusion Tables:
http://code.google.com/p/fusion-tables-client-python/

Then I built the table I wanted to create using the Fusion Tables UI:
http://www.google.com/fusiontables/Home?pli=1

Then I simply setup my task to insert a fusion table row for each player entity with the desired data in it.

To authenticate I user Client Login:

from authorization.clientlogin import ClientLogin
import ftclient

token = ClientLogin().authorize(username, password)
ft_client = ftclient.ClientLoginFTClient(token)

I created a template for the insert I wanted to do


ftquerytemplate = """insert into 1116661 (created_date, login_date, first_transaction_date, last_transaction_date, player_status, number_of_transactions, length_of_play, time_to_txn) 
VALUES ('%(created_date)s', '%(login_date)s', '%(first_transaction_date)s', '%(last_transaction_date)s', '%(player_status)s', %(number_of_transactions)s, %(length_of_play)s, %(time_to_txn)s);"""



Then I just iterated over my players, for each 100 players, I saved a cursor and launched a new task with that cursor. I stacked up the inserts into one string and sent them as a single query at the end. There are some things you have to watch out for:

  • Null values are entered as empty strings ''
  • When entering dates, Fusion tables will enter either just the date or the entire timestamp, however there is no way in fusion tables that I can find to aggregate on a date when you have a timestamp, so make sure you do that conversion in code.
  • If you want a number field to be null you still pass it an empty string ''
Below is the complete controller I used to get the data and place it in Fusion tables. There are a lot of things that could be done by mixing fusion tables and Google App Engine. This was just a quick way for me to solve a problem quickly, that's one of the things I love about these technologies, they are very easy to try out.




from controllers import *
from models import model
import datetime
import logging
import ftclient
from google.appengine.api import taskqueue
from authorization.clientlogin import ClientLogin

ftquerytemplate = """insert into [mytablenumber] (created_date, login_date, first_transaction_date, last_transaction_date, player_status, number_of_transactions, length_of_play, time_to_txn) 
VALUES ('%(created_date)s', '%(login_date)s', '%(first_transaction_date)s', '%(last_transaction_date)s', '%(player_status)s', %(number_of_transactions)s, %(length_of_play)s, %(time_to_txn)s);"""

class UserReporting(BaseController):
    def get(self):
        self.handleReporting()
        
    def post(self):
        self.handleReporting()
        
    def handleReporting(self):
        token = ClientLogin().authorize("[myusername]", "[mypassword]")
        ft_client = ftclient.ClientLoginFTClient(token)
        playerquery = model.Player.all()

        last_cursor = self.request.get('cursor')
        if last_cursor:
            playerquery.with_cursor(last_cursor)

        fetched = playerquery.fetch(100)
        logging.debug("Fetched %s records" % len(fetched))
        ftquery = ""
        for player in fetched:
            #Created data in Fusion Tables 
            playerdata = {'created_date':player.createdDate.date(),
                          'login_date':player.lastLoginDate if player.lastLoginDate else "",
                          'player_status':'inactive' if player.inactive else "active"}
            if player.lastLoginDate and player.createdDate:
                playerdata['length_of_play'] = (player.lastLoginDate - player.createdDate.date()).days
            else:
                playerdata['length_of_play'] = 0
                          
            count = 0
            first_txn = datetime.datetime.max
            last_txn = datetime.datetime.min
            for txn in player.transaction_set:
                count += 1
                if txn.timestamp < first_txn:
                    first_txn = txn.timestamp
                if txn.timestamp > last_txn:
                    last_txn = txn.timestamp
            if first_txn != datetime.datetime.max and player.createdDate:
                playerdata['time_to_txn'] = (first_txn - player.createdDate).days
            else:
                playerdata['time_to_txn'] = "''"

            playerdata['number_of_transactions'] = count
            playerdata['last_transaction_date'] = '' if last_txn == datetime.datetime.min else last_txn 
            playerdata['first_transaction_date'] = '' if first_txn == datetime.datetime.max else first_txn
            ftquery += ftquerytemplate % playerdata
        logging.debug("SQL: %s" % ftquery)
        ft_client.query(ftquery) 
        if len(fetched) == 100:
            taskqueue.add(url='/reporting/userreporting', params={'cursor':playerquery.cursor()})