Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo

 

Now Available: Innovators Under 35 2013 See The 2013 List »

Courtesy Luis von Ahn

Luis von Ahn, 29

Using “captchas” to digitize books

Carnegie Mellon University

Luis von Ahn is a pioneer of "captchas"--those strings of distorted characters that websites force you to recognize and type in order to establish that you are a person and not a malevolent computer. But he finds the technology's success a mixed blessing. "At first I was feeling quite proud of myself," says von Ahn, a 2006 MacArthur "genius grant" recipient who created captchas (an acronym for "completely automated public Turing test to tell computers and humans apart") for Yahoo in 2000 to thwart automated e-mail account registration, a tool of spammers. "But then I was feeling bad, because every time you solve a captcha, you waste 10 seconds." People around the world solve an estimated 60 million captchas every day, adding up to more than 150,000 wasted hours.


Von Ahn, an assistant professor of computer science, is a leader in using human skills to make computers work better. For example, he created an online game in which players identify elements in photographs; their answers help improve image-search algorithms. He's now trying to put captchas to work in one of the epic efforts of the information age: digitizing millions of old books and making them searchable online.


An estimated 8 percent of words in these old books can't be read by the optical character recognition (OCR) software used to scan them. Von Ahn has teamed with the nonprofit Internet Archive to use captchas to help interpret those words. After all, he says, "while you are solving a captcha, you are solving a task that computers can't perform." So he created a tool, called ­"recaptcha," that pairs an unknown word with a known one. He distorts them both and puts a line through them--standard techniques for creating captchas. A user must decipher both captchas to access a site. The accurate typing of the known word serves the security purpose of captchas and adds a measure of confidence that the unknown word was identified correctly and can be used in place of the OCR's gibberish. Volunteers have begun deploying recaptchas, and the technique has been used to decipher two million words for the Internet Archive's book digitization effort. Recaptchas tap the joint power of people, networks, and computers in a way that should have a big impact, says Brewster Kahle, an Internet entre­preneur and cofounder of the archive: "It is like an army of ants building the Taj Mahal."



Credit: reCAPTCHA

This image illustrates the difficulty that optical-character-recognition software can have in interpreting the content of older books. Luis von Ahn's recaptcha project is designed to help replace the OCR gibberish with the actual words.


--David Talbot

2007 TR35 Winners

Sanjit Biswas

Cheap, easy Internet access

Josh Bongard

Adaptive robots

Garrett Camp

Discovering more of the Web

Mung Chiang

Optimizing networks

Tadayoshi Kohno

Securing systems cryptographically

Tariq Krim

Building a personal, dynamic Web page

Ivan Krstic´

Making antivirus software obsolete

Jeff LaPorte

Internet-based calling from mobile phones

Karen Liu

Bringing body language to computer-animated characters

Anna Lysyanskaya

Securing online privacy

Tapan Parikh

Simple, powerful mobile tools for developing economies

Babak Parviz

Self-assembling micromachines

Partha Ranganathan

Power-aware computing systems

Kevin Rose

Online social bookmarking

Marc Sciamanna

Controlling chaos in telecom lasers

Desney Tan

Teaching computers to read minds

Luis von Ahn

Using “captchas” to digitize books

Advertisement

More Innovators Under 35: