Thursday, September 24, 2009

Web users to help digitise faded books

       Google has acquired a Carnegie Mellon University spin-off that seeks to cut down on spam and fraud at websites while digitising books.
       ReCAPTCHA offers simple word puzzles that users must solve when registering at a website or completing an online purchase.
       Computers can't decipher the twisted letters and numbers, ensuring that real people and not automated programs are at the keyboard.
       Unlike other word puzzles, however,ReCAPTCHA's text comes from actual books, letting the system create a digitised version in the process.
       Google Inc is already behind a major project to digitise books and put them online, mostly by scanning pages and using optical character recognition, or OCR, to make the texts searchable.
       OCR doesn't always work on text that is older, faded or distorted. In such cases,often the only way to digitise the works is to manually type them in.
       ReCAPTCHA provides an alternative.Snippets that the computer doesn't recognize are split up into single words that can be used as human tests at sites all over the Internet.
       The ReCAPTCHA system reassembles the text of the book from those responses.
       Carnegie Mellon computer science professor Luis von Ahn, who developed the tool and launched the ReCAPTCHA company in 2008 said:"From the start,people assumed the project was connected to Google, so it only makes sense that ReCAPTCHA Inc ultimately would find a home within Google."

No comments:

Post a Comment