
Multi-lingual optical character recognition (OCR)
Why is it in news?
- Recently, IIT Madras has developed a unified script for nine Indian languages, named the Bharati script.
- Now a step further, it has developed a method for reading documents in Bharati script using a multi-lingual optical character recognition (OCR) scheme.
More in news
- Finger-spelling technique:(1) It has also created a finger-spelling method that can be used to generate a sign language for hearing-impaired persons.(2) It has found a way for persons with hearing disability to generate signatures using this finger-spelling technique.
- Integrated script:(1) The scripts that have been integrated include Devnagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, Malayalam and Tamil.(2) English and Urdu have not been integrated so far.
- Basics of OCR scheme:(1) In general, optical character recognition schemes involve first separating (or segmenting) the document into text and non-text.(2) The text is then segmented into paragraphs, sentences words and letters. Each letter has to be recognised as a character in some recognisable format such as ASCII or Unicode.(3) The letter has various components such as the basic consonant, consonant modifiers, vowels etc.
Source
The Hindu