What we learned from 5 million books

Perhaps the biggest collection of words ever assembled has just gone online: 500 billion of them, from 5 million books published over the past four centuries.

The words make up a searchable database that researchers at Harvard say is a new and powerful tool to study cultural change.

The words are a product of Google’s book-scanning project. The company has converted approximately 15 million books so far into electronic documents. That’s about 15 percent of all books ever published. It includes books published in English, Spanish, French, German, Chinese, Russian and Hebrew.

The full NPR article.

And a Ted Talk about the project.

Have you played with Google Labs’ Ngram Viewer? It’s an addicting tool that lets you search for words and ideas in a database of 5 million books from across centuries. Erez Lieberman Aiden and Jean-Baptiste Michel show us how it works, and a few of the surprising things we can learn from 500 billion words.

Private and public in the Ngram Viewer.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: