Michael Robellard's Blog: 2015

Sunday, October 18, 2015

Excited for Postgres 9.5(Locking)

I was reading about some of the new things coming up in the Postgres 9.5 release and I was very excited about some new locking options that will make concurrency even better in Postgres.

I remember several years ago I thought it would be a good idea to use Postgres as Queue service. At the time I did not want to add a new Service to our infrastructure, and I though Postgres would be up to the challenge. This was the first time I ran into the problem that the first locking feature we are going to talk about solves.

In my early days of trying to make this work I did something along the lines of

select * from queue_table
where processing = false
limit 1
for update;

The problem with this is that the first call will lock the first row for writing and the second call will sit and wait for the lock to release on the row before continuing. In Postgres 9.5 we get a new option:

select * from queue_table
where processing = false
limit 1
for update skip locked;

Problem solved. I still wouldn't use Postgres as a Queue system given the many things I have learned since then, however there are other places where you will run into this problem.

Here is another one that would save us a lot of headaches at my previous employer. We had data where you had a data_table that you joined on the users table. You would then want to select for update the data in the data_table while joining on the users table. This would previously row lock the row in the user table and could cause issues.

select * from data_table
join users on data_table.user_id = users.id where data_table.id = 12345
for update;

The new solution to this is:

select * from data_table
join users on data_table.user_id = users.id where data_table.id = 12345
for update of data_table for share of users;

I hope I have shown you some new things that are going to make Postgres just that much better.

Monday, August 17, 2015

Excited for Postgres 9.5(Row Level Security)

Every year around this time I get very excited about the new features that are getting ready to be released in the next version of Postgres. This year is no exception, There is a trove of new features that have me salivating. Today I am going to talk about one of these new features, Row Level Security. I hope to talk about some of the other features soon.

As an architect/developer of SaaS apps I can't help but be intrigued by the new Row Level Security model. Right now SaaS apps typically implement security either at the application level or by having seperate databases or schemas per customer. Each of these methodoligies has its pluses and minuses, but now we will have a new option to consider.

Application Level Security

This is what most people use when they write applications. You simply add a column to your tables to limit them to a specific user or customer and make sure you always query with those columns in your where clause or join somewhere. This works great 99% of the time, but what happens when some big customer comes up to you and wants to connect their data in your app to a custom reporting system, or they exercise the cluase in their contract where they want all their data from your database becuase they are leaving you? Another place where this could be an issue is what if your application layer is vulnerable to a SQL injection attack or some other nefarious attack. With application level security the attacker now has access to all the data in your database.

Seperate Databases/Schemas

This model of development solves the problems from above very effectively. It also has the added benfit of allowing you to shard your data horizontally across many machines which can be very advantageous as your company grows. There are however tradeoffs, and one of the big ones is complexity. You now have to track which database and schema to connect to for each user/customer, you have to figure out how to mange pools of connections for each database to each web box. Superuser/Administration activities become a big problem. Shared data is now a problem. Running analytics and metrics on your system is much harder.

Row Level Security

RLS gives you the ability to have the advantages of both with the exception of a convienent sharding mechanism. Using RLS you set up policies for tables which are nothing more then where clauses or check constraints that are applied on the table for every query. Now that you can move those security checks to the database It becomes easy to give access to only a customers data. There is also an option on pg_dump to enable RLS so you can easily only dump a users data. If your web box becomes compromised the only data that can be accessed now is the data for the customer that is logged in. (Clearly if the comprmoised user is a superuser the original problem persists.)

I can't wait to try these features out in production and see if they deliver on their promises.

References:

https://wiki.postgresql.org/wiki/Row-security

http://www.postgresql.org/docs/9.5/static/sql-createpolicy.html
http://jimkeener.com/posts/postgres-9.5-row-level-security-rls

Thursday, July 16, 2015

Don't test with SQLLite when you use Postgres in Production

Yesterday, I was looking for some help trying to have a clean way to roll back database transactions at the end of a test in a Python TestCase. So, naturally, I started searching and found some posts on Stack Overflow and started to read them. I was instantly mortified by what I read.

But in any case, an in-memory database (SQLite makes this easy) is the best choice for unit tests.

Huh? I, at least, will listen and have a nice healthy debate when people suggest you should mock out every call to the database in the unit tests because if you are talking to the database it's not a unit test its a functional test. That argument is mainly one of semantics of what you call the test. In the case of using a different database engine in a testing environment than a production environment, why would you want to do that?

What's worse is the next stack overflow question I went to had almost the same answer by another person.

You can simply create an in-memory SQLite database and bind your session to that.

So, first, I wanted to stand on the rooftop and yell at the top of my lungs to anyone who would listen. This is a really bad idea. Here's why.

SQLite has different SQL semantics than Postgres, so a query written on SQLite may not run in Postgres
SQLite has different bugs than Postgres, so just because it works in SQLite does not mean it will work in Postgres. Also the reverse is true, just because you can't get it to work in SQLite doesn't mean it won't work in Postgres
Postgres has way more features that SQLite. Try doing a CTE query in SQLite. You can't use any of these features in your application if you test in a different database than you use for your production environment. (It ends up in 2014 SQLite added CTEs to their software, however there are differences, which is the point, for instance it does not appear that you can do insert or update in a SQLite CTE).
How can you test deployments if your test environment uses a different database than your production environment?
How can you test that your application works with the newest version of the database before you push it to production?

This sentiment about testing in a different database than you use in production is not just limited to stackoverflow it seems. I found a document on leading cloud service provider's site that said this:

When To Use SQLite

Testing:

It is an overkill for a large portion of applications to use an additional process for testing the business-logic (i.e. the application's main purpose: functionality).

This is even more strange coming from them since one of the valid reasons cloud service providers give for using their services is that you can have a test environment that mirrors your production environment as closely as possible.

I specifically mentioned SQLite and Postgres, but in reality you could replace any two technologies and the arguments don't change. You wouldn't want to test on SQLite and use MySQL in production, nor would you want to test on Postgres and use Oracle in production.

So remember don't develop and test against one database and release against a different database.

Edited Aug. 4
Fixed spelling error for SQLLite -> SQLite.
SQLite does have CTE support since 2014.

Thanks for all the great comments, and the godd discussion this is creating within the community.