Digital Library of 100 timeless titles in each Indian language. Interested?
by Anjali Gupta
The Gutenberg project makes available over 33,000 previously published books in the form of e-books for free. This is done with the help of thousands of volunteers – a project called Distributed Proofreaders. The contributions made by these volunteers empowers readers to enjoy these books on Apple’s ipad, Kindle, Android, and similar platforms.
With OCR or even manual typing there will be several errors in the text produced. Human proofreading becomes a necessary activity before the book is converted into a downloadable e-book. Similar to translation, only a real person can spot and correct the errors. I often notice at least a few typos in newly published books. I wonder why authors don’t employ crowd-sourcing to get their chapters proofread. The ability to read the content early is reward enough for volunteers.
Old magazine articles, comics and famous letters from Indian can be made available with the power of distributed or crowd-powered proofreading. It’s unfortunate that there there are no digitized old books available in Indian languages on Gutenberg.
Using Dubzer (free crowd-sourced proofreading), Lipikaar (easy unicode-based typing for Indian languages), Pothi (self-publishing, print on demand, downloadable e-books), and other such web-based platforms we can create a digital library for timeless Indian content whose copyright has expired and can be publicly distributed. Even semi-urban or rural folks who read well in their local language and have poor access to libraries will be empowered to make reading an enjoyable leisure activity. With India’s 3G powered smart-phone revolution, is this hard to imagine? We can initially aim to create 100 e-book titles in each Indian language including English.
The possibilities are exciting and challenging. These ideas came up as a result of our conversations with Abhaya Agarwal, co-founder of Pothi.com, who has a keen interest in the work published by Indian authors/journalists who did not have the benefit of digitization.
We would love to jump start this initiative with a group of like-minded folks. Do write to us if you have any of these – insights or leads to such attempts, OCR expertise, relevant OCR open-source software, timeless books/articles/magazines/literature, typed text, etc. Even if you don’t have these please join in with your ideas and enthusiasm. Students are welcome too!
Update (February 1, 2011)
StoryDB.in (A Story Database for India) has been created. Four Hindi books that are now out of copyright have been listed here which were contributed to the database by Abhaya Agarwal.
Anjali –
1) You may want to check out Bhandarkar institute in Pune (http://www.bori.ac.in) for leads.
2) For OCR software, may want to try this: http://stackoverflow.com/questions/2078800/ocr-for-devanagari-hindi-marathi-sanskrit. If you have not used this site before, should say that it is very useful and full of experts who are eager to answer technical questions, so you can try posting your own question.
3) I was thinking that perhaps the response from the community will be better if the initial set of books have some common theme, such as Indian mythology or Indian arts/culture or notable Indian personalities. This will give some purpose to the initiative.
4) Also on a somewhat related note, check out the Rosetta project at http://childrensbooksonline.org/library.htm. The site operates through volunteer work, makes illustrated antique childrens’ books available (scanned images, no OCR yet) and each page has a menu to access the translated content. (btw, not much is translated in Indian languages except for 2 books in Hindi. I recently submitted a translation into Marathi, which is not up yet). A few things were striking: because these are childrens’ books, the content per book is smaller and there is variation in the complexity. So a volunteer can start off with simple books and then get into more complex/long ones. When it comes to translations, I do like the idea of one translator “owning” an entire translation – this makes sure that the language and style of the translated version is consistent throughout and also, credits can be given on a per book basis, which encourages translators to translate more books.
More later ~
These are all great leads! Will keep you posted. Thanks so much Prachi.
[…] read about the plan via a tweet. And, good ol’ me is excited. There has been far too much talk about this and […]
hi anjali, i am with the mint newspaper and wanted to speak with you for an article i am working on.
thanks, pl email me. thanks,
himanshu
Thanks @sankarshan for spreading the word and sharing your insights.
Hey I would be interested to be a volunteer. Just tell me where I can sign up.
hi anjali,
I am interested in creating e library and proof reading too.please provide me details.
I can help you with bengali.
good effort carry on anjali.
[…] Digital Library of 100 timeless titles in each Indian language. Interested? (tinkeron.com) […]