Have you ever encountered a web form that asked you to type a distorted sequence of characters before you could proceed? If you have, you might have wondered what the purpose of this annoying task was. Well, you were actually solving a CAPTCHA, a test that is designed to tell humans and computers apart. CAPTCHAs are used to prevent spam and abuse on the web, such as automated ticket scalping or fake account creation.
But did you know that CAPTCHAs are also helping to digitize books and preserve human knowledge? In this blog post, I will tell you the story of how CAPTCHAs evolved from a security measure to a social good project.
What is a CAPTCHA and why do we need it?
The word CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart. It was coined by Luis von Ahn and his colleagues at Carnegie Mellon University in the early 2000s. A CAPTCHA is a challenge-response test that requires the user to perform a task that is easy for humans but hard for computers. For example, reading distorted text, identifying objects in images, or listening to audio clips. The idea is that only a human can pass the test, while a computer program would fail. This way, the web service can verify that the user is not a bot that is trying to abuse the system.
CAPTCHAs are widely used on the web for various purposes, such as:
- Protecting online polls from being manipulated by bots
- Preventing comment spam on blogs and forums
- Blocking brute-force attacks on passwords and accounts
- Reducing email spam by requiring users to solve a CAPTCHA before sending a message
- Controlling the access to online content and services, such as ticket sales, online games, or streaming platforms
CAPTCHAs are necessary because there are many malicious actors on the web who want to exploit the resources and information of legitimate users and websites. By using CAPTCHAs, web services can increase their security and reliability, and provide a better user experience for their human customers.
How CAPTCHAs became a tool for digitizing books
While CAPTCHAs are useful for security, they also have a downside: they waste a lot of human time and effort. According to Luis von Ahn, about 200 million CAPTCHAs are solved every day by people around the world, and each CAPTCHA takes about 10 seconds to complete. That means that humanity is spending about 500,000 hours every day typing these annoying CAPTCHAs. That’s a lot of time that could be used for something more productive and meaningful.
That’s why Luis von Ahn and his team came up with an ingenious idea: what if we could use the human brainpower that is spent on solving CAPTCHAs for something that is good for humanity? What if we could use CAPTCHAs to help digitize books and preserve human knowledge?
This is how reCAPTCHA was born. reCAPTCHA is a project that uses CAPTCHAs to help digitize books that are too old and damaged for computers to read. The process works like this:
There are many projects that are trying to digitize books, such as Google Books, the Internet Archive, or Amazon Kindle. These projects scan the physical books and convert them into digital images, one for each page.
The next step is to use optical character recognition (OCR) software to recognize the text in the images and convert it into editable and searchable text. However, OCR is not perfect, especially for older books that have faded ink, yellow pages, or handwritten notes. OCR can fail to recognize about 30% of the words in these books, which means that a lot of valuable information is lost or corrupted.
This is where reCAPTCHA comes in. reCAPTCHA takes the words that OCR cannot recognize and sends them to the web as CAPTCHAs. When you solve a reCAPTCHA, you are not only proving that you are human, but also helping to transcribe a word from a book that is being digitized. By combining the answers from multiple users, reCAPTCHA can verify the correct transcription of the word and add it to the digital text of the book.
This way, reCAPTCHA is using the collective intelligence of millions of people to help digitize books and make them accessible to everyone. reCAPTCHA is also helping to preserve the cultural and historical heritage of humanity, by saving books that would otherwise be lost or forgotten.
How reCAPTCHA is making the web a better place
reCAPTCHA is not only a clever way to use CAPTCHAs for a good cause, but also a successful example of how technology can be used to create positive social impact. reCAPTCHA was launched in 2007 as a research project at Carnegie Mellon University, and then spun off as a start-up company. In 2009, Google acquired reCAPTCHA and integrated it into its services, such as Gmail, YouTube, and Google Books. Since then, reCAPTCHA has grown to become one of the most popular and widely used CAPTCHA systems on the web, with more than a billion users and hundreds of thousands of websites using it.
According to Google, reCAPTCHA has helped to digitize more than 100 million books and 25 billion newspaper pages, and has improved the accuracy of OCR by more than 25%. reCAPTCHA has also contributed to other social good projects, such as:
- Helping to translate the web into different languages, by using reCAPTCHA to crowdsource translations of words and phrases
- Helping to improve Google Maps, by using reCAPTCHA to verify the names and locations of streets and landmarks
- Helping to protect endangered languages, by using reCAPTCHA to collect and preserve words and sentences from languages that are at risk of extinction
- Helping to fight spam and abuse, by using reCAPTCHA to detect and block bots and malicious traffic on the web
reCAPTCHA is a remarkable example of how a simple idea can have a huge impact on the world. By turning a mundane and annoying task into a meaningful and rewarding one, reCAPTCHA has transformed CAPTCHAs from a waste of time into a force for good.
reCAPTCHA has shown that technology can be used not only to solve problems, but also to create opportunities and value for society. reCAPTCHA has demonstrated that we can use the power of the web to make the world a better place, one CAPTCHA at a time.