Thursday, July 16, 2015

Don't test with SQLLite when you use Postgres in Production

Yesterday, I was looking for some help trying to have a clean way to roll back database transactions at the end of a test in a Python TestCase. So, naturally, I started searching and found some posts on Stack Overflow and started to read them. I was instantly mortified by what I read.
But in any case, an in-memory database (SQLite makes this easy) is the best choice for unit tests. 
Huh? I, at least, will listen and have a nice healthy debate when people suggest you should mock out every call to the database in the unit tests because if you are talking to the database it's not a unit test its a functional test. That argument is mainly one of semantics of what you call the test. In the case of using a different database engine in a testing environment than a production environment, why would you want to do that?

What's worse is the next stack overflow question I went to had almost the same answer by another person.
You can simply create an in-memory SQLite database and bind your session to that.

So, first, I wanted to stand on the rooftop and yell at the top of my lungs to anyone who would listen. This is a really bad idea. Here's why.

  1. SQLite has different SQL semantics than Postgres, so a query written on SQLite may not run in Postgres
  2. SQLite has different bugs than Postgres, so just because it works in SQLite does not mean it will work in Postgres. Also the reverse is true, just because you can't get it to work in SQLite doesn't mean it won't work in Postgres
  3. Postgres has way more features that SQLite. Try doing a CTE query in SQLite. You can't use any of these features in your application if you test in a different database than you use for your production environment. (It ends up in 2014 SQLite added CTEs to their software, however there are differences, which is the point, for instance it does not appear that you can do insert or update in a SQLite CTE).
  4. How can you test deployments if your test environment uses a different database than your production environment?
  5. How can you test that your application works with the newest version of the database before you push it to production?
This sentiment about testing in a different database than you use in production is not just limited to stackoverflow it seems. I found a document on leading cloud service provider's site that said this:


When To Use SQLite
  • Testing:
It is an overkill for a large portion of applications to use an additional process for testing the business-logic (i.e. the application's main purpose: functionality).
This is even more strange coming from them since one of the valid reasons cloud service providers give for using their services is that you can have a test environment that mirrors your production environment as closely as possible.

I specifically mentioned SQLite and Postgres, but in reality you could replace any two technologies and the arguments don't change. You wouldn't want to test on SQLite and use MySQL in production, nor would you want to test on Postgres and use Oracle in production.

So remember don't develop and test against one database and release against a different database.

Edited Aug. 4
Fixed spelling error for SQLLite -> SQLite.
SQLite does have CTE support since 2014.

Thanks for all the great comments, and the godd discussion this is creating within the community.

17 comments:

  1. I was at a talk about Database testing at PyOhio this last weekend and heard someone else suggest this again as a viable testing option. The reasoning that was given for why you would use an in memory sqllite database instead of your production database was the performance.

    Would it really be more performant when you really need to do the testing on the production database eventually anyway. Do you want to test in the right environment sooner or later when it will cost way more and take way longer to fix.

    ReplyDelete
  2. CTEs work in recent SQLites.
    But if you really want to test the SQL parts of your application, you are obviously right.

    ReplyDelete
  3. For unit testing it is needed to have a quick feedback cycle. For this purpose it is great to use an in-memory database, or mock. But when you want to be sure what you deliver is correct, you would have a set of integration tests. This should run against your final database implementation.

    ReplyDelete
    Replies
    1. So what you're basically saying is that your entire Unit Test runner needs to "switch" to pick between "in-memory" mode and "against a DB" mode. And you also need to mock your ORM for the "in-memory" mode.

      Delete
  4. Usually most of the tests will work on sqlite although you're using pgsql in production if you write standard sql queries. But still that's not safe. There are obvious cases where you need test with PG also like: postgis, jsonb and other pg specific. Also there's a case I got burned a few times: SQLite doesn't know about varchar(n) but will create text field (65k size); so you'll test will pass while inserting 1000 chars in production that will fail

    ReplyDelete
    Replies
    1. For postgis you can try H2GIS http://www.h2gis.org/
      it is an extension of the H2 database.

      Delete
  5. I believe the context in the Stack Overflow answers you refer to are testing a database layer that uses an ORM. For instance the Python ORM SQLAlchemy abstracts away the underlying db engine. If you stick to using the ORM (and not execute any raw SQL from your app), then testing against inmemory SQLite might work fine. YMMV.

    ReplyDelete
    Replies
    1. I generally agree with this, but I also think OP has a legitimate concern -- it would be bad to deploy to production without ever testing against your real production stack.

      I'd propose running your tests locally against SQLite for quick feedback, then having continuous integration run them again, against Postgres, when you push. That way, engineers aren't waiting around for slow tests, but they'll still get notified if something doesn't work in Postgres, so that they can fix it. Best of both worlds.

      Delete
    2. Look, if this is really important AND you're not using any custom SQL, then why don't you just use the Mocks that ship with the SQLAlchemy ORM.

      I mean, why are you even using SQLite? If you want in-memory mocks and you've already acknowledged that you can't test raw SQL, then you should really just be mocking the ORM.

      (and adding a switch to run against a real DB for deployable builds)

      Delete
  6. I completely agree. I believe it takes just one minor difference between two DBMS implementations which will eventually eat up all the precious time you 'saved' by summing those milliseconds in the unit tests.

    ReplyDelete
  7. SQLite is the worst choice for this. It's very non-standard. H2 is probably the best for this kind of thing (IF you really need to do this).

    ReplyDelete
  8. I think this type of topic makes for a great interview question to avoid undesirable engineers. Anyone who wants to switch out a part in test and replace it with something different in production should not be working on anything important.

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. NASA rule of thumb I read somewhere.

    "Test what you fly. Fly what you test."

    ReplyDelete
  11. @Michael: I agree with you 100%.

    We actually have our Unit Tests (call them "integration" tests?) running against an actual local PGSQL instance. We have caught so many important things this way, including actual bugs in PGSQL.

    I think the SQLite idea is a quick way to create a fake in-memory database so that you can run tests quickly. Especially in code where all of the interactions are done via an ORM.

    What people really miss here is if you can only test ORM code using the in-memory DB, then you should really be mocking the ORM, not inserting a bad DB substitute.

    ReplyDelete
  12. This comment has been removed by the author.

    ReplyDelete
  13. "Fly what you test and test what you fly."

    SQLite itself lives by that motto. Testing of SQLite is done using exactly the same compiler options as is used for the release. This is critical for reliability.

    But it is also ok, and in fact usually good, to do additional testing in non-release configurations. For example, SQLite is also tested with the -DSQLITE_DEBUG compile-time option which enables assert() statements. This can find problems that were not detected by a test using the release configuration.

    The same principle applies to applications. If you are using PostgreSQL in production, the definitely, definitely test on PostgreSQL. And in fact, test on exactly the same version of PostgreSQL, with exactly the same configuration. Do this without fail.

    But at the same time, it can also be helpful to test using SQLite (or another RDBMS) as well. You might find bugs that way that you missed otherwise. For example, suppose your application has a query that omits the ORDER BY clause but still depends on a particular output order. Maybe PostgreSQL just happens (by luck) to output the result in the order you expect. But SQLite might return the results in a different order, thus revealing the bug.

    SQLite might also be useful for unit tests, or development testing, since setting up or duplicating an SQLite database is simpler than doing the same in PostgreSQL. With SQLite, a database is just an ordinary disk file, which can be copied easily, sent as an email attachment, or checked into a version control system. Managing test databases in PostgreSQL takes more work. If testing is easier to do, then more testing is likely to occur.

    So, while it is vitally important to test your application in an exact replicate of your production environment, that does not preclude also running tests in other environments which are more convenient.

    (The author is this post is the creator and lead developer of SQLite.)

    ReplyDelete