Excerpt from an interview with Facebook vice president of engineering Mike Schroepfer
Your best weapon in most computer science problems is caching. But if, like the Facebook home page, it's basically updating every minute or less than a minute, then pretty much every time I load it, it's a new page, or at least has new content. That kind of throws the whole caching idea out the window. Doing things in or near real time puts a lot of pressure on the system because the live-ness or freshness of the data requires you to query more in real time.
We've built a couple systems behind that. One of them is a custom in-memory database that keeps track of what's happening in your friends network and is able to return the core set of results very quickly, much more quickly than having to go and touch a database, for example. And then we have a lot of novel system architecture around how to shard and split out all of this data. There's too much data updated too fast to stick it in a big central database. That doesn't work. So we have to separate it out, split it out, to thousands of databases, and then be able to query those databases at high speed.