Multi-lingual optical character recognition (OCR)

Recently, IIT Madras has developed a unified script for nine Indian languages, named the Bharati script.

share this post:

Why is it in news?

  • Recently, IIT Madras has developed a unified script for nine Indian languages, named the Bharati script.
  • Now a step further, it has developed a method for reading documents in Bharati script using a multi-lingual optical character recognition (OCR) scheme.

More in news

  • Finger-spelling technique:
    (1) It has also created a finger-spelling method that can be used to generate a sign language for hearing-impaired persons.
    (2) It has found a way for persons with hearing disability to generate signatures using this finger-spelling technique.
  • Integrated script:
    (1) The scripts that have been integrated include Devnagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, Malayalam and Tamil.
    (2) English and Urdu have not been integrated so far.
  • Basics of OCR scheme:
    (1) In general, optical character recognition schemes involve first separating (or segmenting) the document into text and non-text.
    (2) The text is then segmented into paragraphs, sentences words and letters. Each letter has to be recognised as a character in some recognisable format such as ASCII or Unicode.
    (3) The letter has various components such as the basic consonant, consonant modifiers, vowels etc.

Source

The Hindu