Multi-lingual optical character recognition (OCR)

Why is it in news?
  • Recently, IIT Madras has developed a unified script for nine Indian languages, named the Bharati script.
  • Now a step further, it has developed a method for reading documents in Bharati script using a multi-lingual optical character recognition (OCR) scheme.
More in news
  • Finger-spelling technique:
    (1) It has also created a finger-spelling method that can be used to generate a sign language for hearing-impaired persons.
    (2) It has found a way for persons with hearing disability to generate signatures using this finger-spelling technique.
  • Integrated script:
    (1) The scripts that have been integrated include Devnagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, Malayalam and Tamil.
    (2) English and Urdu have not been integrated so far.
  • Basics of OCR scheme:
    (1) In general, optical character recognition schemes involve first separating (or segmenting) the document into text and non-text.
    (2) The text is then segmented into paragraphs, sentences words and letters. Each letter has to be recognised as a character in some recognisable format such as ASCII or Unicode.
    (3) The letter has various components such as the basic consonant, consonant modifiers, vowels etc.
Source
The Hindu




Posted by Jawwad Kazi on 30th Apr 2019