资 源 简 介
Dear friends!
One year more our group has been developing OCR (optical character
recognition) and translation system with Open Source code for Asian languages.
So it is 10 years now:) wow!
At present it is OCR more then million pages for www.tbrc.org and www.dharmabook.ru libraries with support of Trace Foundation www.trace.org, St.Petersburg State University and Moscow Dharma Center Rime community members.
The key features of the OCR system include:
At present it is MacOS server version Tibetan, Sanskrit, Sinhala, Kannada, Latin and Cyrilic OCR.
High accuracy
For Tibetan books, the current recognition results are 1-3 error per 1000
characters. It is include dictionary and mixed text.
On next stages of development it is need same error level on manuscripts and damaged text OCR
Fast low-end database with associative search on base of Markov-chain algorithm
1gb/sec search with fuzzy query