Perhaps the biggest collection of words ever assembled has just gone online: 500 billion of them, from 5 million books published over the past four centuries.
The words make up a searchable database that researchers at Harvard say is a new and powerful tool to study cultural change.
The words are a product of Google’s book-scanning project. The company has converted approximately 15 million books so far into electronic documents. That’s about 15 percent of all books ever published. It includes books published in English, Spanish, French, German, Chinese, Russian and Hebrew.
And a Ted Talk about the project.
Have you played with Google Labs’ Ngram Viewer? It’s an addicting tool that lets you search for words and ideas in a database of 5 million books from across centuries. Erez Lieberman Aiden and Jean-Baptiste Michel show us how it works, and a few of the surprising things we can learn from 500 billion words.