reCAPTCHA is an excellent example of not only solving an informational processing problem in a creative way, but in solving the original problem, also solving a much larger one.
Before you can understand reCAPTCHA, you must first understand its predecessor: CAPTCHA. CAPTCHA was created to solve the problem of automated programs (or “bots”) from logging into websites and thereby generating spam in the form of emails and mass postings.
A CAPTCHA screen displays a distorted image of letters or words. A person can read the letters, but a bot cannot. The user must enter the letters correctly to gain access to the system, for example, to sign up for an email account.
This technology alone is a great example of a creative solution to a complex problem. But reCAPTCHA takes it a step further by solving an even bigger problem.
This larger problem involves an ancient form of communication – the printed page. There are tens of thousands of books and newspapers that Google is trying to convert to digital text. Scanning the publications, then using OCR (optical character recognition) to convert the scanned image to text has its limits. If the text is distorted (as it is in many of the older publications), it cannot convert the text.
How does this relate to CAPTCHA? Well, about 200 million CAPTCHAs are done by people every day. If each CAPTCHA takes ten seconds, this effort represents about 63 person years of work every day.
Wouldn’t it be amazing if there was a way to put all this time to good use? That is exactly what reCAPTCHA does.
Here’s how it reCAPTCHA works:
- When a document is scanned, it detects a word that it cannot convert. Let’s call this the “unknown word”.
- The reCAPTCHA process sends this unknown word as a CAPTCHA for people to deciphere.
- The CAPTCHA contains not only the “unknown word”, but another word which the system already knows. We’ll call this the “known word”.
- In the CATPCHA that is created, the user is asked to read both words and enter them.
- If the user solves the known word, the system assumes that their answer will be correct for the unknown word.
- The system also gives the unknown word to a few other people to verify that the original answer was correct.
- If enough people agree on what the unknown word is, the information is set back to the original system and the converted word is added to the document that is being digitized.
- This process is repeated until all the words in the document are converted.
Can you even begin to imagine the flash of genius that occurred in the mind of the Luis von Ahn, the creator of the reCAPTCHA process?
The problem is that these type of “eureka” moments are very difficult to create. They often just happen, much like the weather. You can no more force yourself to be creative that you can force yourself to love, hate, forget something, fall asleep or go back in time.
However, you can sometimes find creative solutions if you just stop what you’re doing, and ask yourself some questions, such as:
- Is there a better way to present this information to the end user?
- What else would a user need to know about this concept, task, or thing?
- How does the user use our documents?
- What changes could be made to enhance the documentation development process?
I’ll give some examples of real-life creative solutions that I’ve encountered:
Example 1: Our help files have to be checked into a version control system. Each help project can contain hundreds of individual files, and these files are often created, deleted, moved and renamed. It would have been very cumbersome to keep track of each file that was checked in and out. The solution (from a colleague of mine) was this: instead of checking in and out the various files, a zip file of the entire help system was created and checked in instead. The installation program then decompresses this zip file. Only one file now needs to be sent and tracked in the build.
Example 2: I was working with a developer on a complex database administration application. One of the functions the user could do was rerun a query by clicking a button labeled, appropriately enough, Rerun query. The developer said the problem was that there were many different queries that the user could run, and that they needed a quick way to know which one they had run before re-running it. I asked if was possible to embed the name of the query that had just run into the button name, so that, for example, if the user had run the Last Name query, the button label would be Rerun Last Name query? I still remember the developer’s eyes widening and his face lighting up as recognized the elegant beauty of this solution. “Yes,” he said, “it can be done!”
Example 3: Many of our help projects share content, templates, and other settings. I wanted to develop a simple content management system that would allow all the writers to share these things across many locations. I created a master help project that contained all the common content and settings. I then linked my other help projects to this master project, so that if any of the common material changed, it would automatically be updated in the other help projects. Finally, I stored all the documentation on a version control system that could be accessed by any writer. As long as each writer has the current version of the master help project and links their other help projects to it, this will ensure the templates and content remained standard.
So don’t just think “outside the box”.
Ask yourself if you even need the box in the first place.