Another example of Google leveraging its search-related database
[…] Google also has a huge amount of data on how people use search, and it was able to use that to train its algorithms. If the system has trouble interpreting one word in a query, for instance, it can fall back on data about which terms are frequently grouped together.
Google also had a useful set of data correlating speech samples with written words, culled from its free directory service, Goog411. People call the service and say the name of a city and state, and then say the name of a business or category. According to Mike Cohen, a Google research scientist, voice samples from this service were the main source of acoustic data for training the system.
But the data that Google used to build the system pales in comparison to the data that it now has the chance to collect.