Lieberman Aiden and Jean-Baptiste Michel, both at Harvard's Program for Evolutionary Dynamics, led the project, which they've dubbed "culturomics"—a portmanteau combining "culture" and "genomics." The first fruit of their labors was a mammoth database of the words in about 5.2 million books published between 1800 and 2000—roughly four percent of all published books. These came from the Google Books project, whose library contains 15 million books.
In today's issue of the journal Science, the researchers introduce their project along with some of the first results they've derived from the data. In connection with the publication, Google is rolling out an application (at www.culturomics.org) that allows anyone to access and analyze the finished database, which includes 2 billion words and phrases.