• Fulltext

       

        Click here to view fulltext PDF


      Permanent link:
      https://www.ias.ac.in/article/fulltext/sadh/027/01/0083-0097

    • Keywords

       

      Document processing; optical character recognition; script identification; probabilistic neural network; multi-script multi-lingual document

    • Abstract

       

      The paper describes a neural network-based script identification system which can be used in the machine reading of documents written in English, Hindi and Kannada language scripts. Script identification is a basic requirement in automation of document processing, in multi-script, multi-lingual environments. The system developed includes a feature extractor and a modular neural network. The feature extractor consists of two stages. In the first stage the document image is dilated using 3 X 3 masks in horizontal, vertical, right diagonal, and left diagonal directions. In the next stage, average pixel distribution is found in these resulting images. The modular network is a combination of separately trained feedforward neural network classifiers for each script. The system recognizes 64 X 64 pixel document images. In the next level, the system is modified to perform on single word-document images in the same three scripts. Modified system includes a pre-processor, modified feature extractor and probabilistic neural network classifier. Pre-processor segments the multi-script multi-lingual document into individual words. The feature extractor receives these word-document images of variable size and still produces the discriminative features employed by the probabilistic neural classifier. Experiments are conducted on a manually developed database of document images of size 64 X 64 pixels and on a database of individual words in the three scripts. The results are very encouraging and prove the effectiveness of the approach.

    • Author Affiliations

       

      S Basavaraj Patil1 N V Subbareddy1 2

      1. Kuvempu University Research Centre, Department of Computer Science and Engineering, University B D T College of Engineering, Davangere - 577 004, India
      2. Department of Computer Science & Engineering, Manipal Institute of Technology, Manipal - 576 119, India
    • Dates

       
  • Sadhana | News

    • Editorial Note on Continuous Article Publication

      Posted on July 25, 2019

      Click here for Editorial Note on CAP Mode

© 2021-2022 Indian Academy of Sciences, Bengaluru.