Thursday, February 23, 2012

Evernote on ACID

The ACID benefits of a transactional database make it very hard to scale out a data set beyond the confines of a single server. Database clustering and multi-master replication are scary dark arts, and key-value data stores provide a much simpler approach to scale a single storage pool out across commodity boxes.

Fortunately, this is a problem that Evernote doesn’t currently need to solve. Even though we have nearly a billion Notes and almost 2 billion Resource files within our servers, these aren’t actually a single big data set.  They’re cleanly partitioned into 20 million data separate data sets, one per user.

This extreme locality means that we don’t have one “big data” storage problem, but rather we have a lot of “medium data” storage problems that partition neatly into a sharded architecture.

