The Gutenberg project makes available over 33,000 previously published books in the form of e-books for free. This is done with the help of thousands of volunteers – a project called Distributed Proofreaders. The contributions made by these volunteers empowers readers to enjoy these books on Apple’s ipad, Kindle, Android, and similar platforms.
With OCR or even manual typing there will be several errors in the text produced. Human proofreading becomes a necessary activity before the book is converted into a downloadable e-book. Similar to translation, only a real person can spot and correct the errors. I often notice at least a few typos in newly published books. I wonder why authors don’t employ crowd-sourcing to get their chapters proofread. The ability to read the content early is reward enough for volunteers.
Old magazine articles, comics and famous letters from Indian can be made available with the power of distributed or crowd-powered proofreading. It’s unfortunate that there there are no digitized old books available in Indian languages on Gutenberg.
Using Dubzer (free crowd-sourced proofreading), Lipikaar (easy unicode-based typing for Indian languages), Pothi (self-publishing, print on demand, downloadable e-books), and other such web-based platforms we can create a digital library for timeless Indian content whose copyright has expired and can be publicly distributed. Even semi-urban or rural folks who read well in their local language and have poor access to libraries will be empowered to make reading an enjoyable leisure activity. With India’s 3G powered smart-phone revolution, is this hard to imagine? We can initially aim to create 100 e-book titles in each Indian language including English.
The possibilities are exciting and challenging. These ideas came up as a result of our conversations with Abhaya Agarwal, co-founder of Pothi.com, who has a keen interest in the work published by Indian authors/journalists who did not have the benefit of digitization.
We would love to jump start this initiative with a group of like-minded folks. Do write to us if you have any of these – insights or leads to such attempts, OCR expertise, relevant OCR open-source software, timeless books/articles/magazines/literature, typed text, etc. Even if you don’t have these please join in with your ideas and enthusiasm. Students are welcome too!
Update (February 1, 2011)
StoryDB.in (A Story Database for India) has been created. Four Hindi books that are now out of copyright have been listed here which were contributed to the database by Abhaya Agarwal.