Articles written in Sadhana

    • NN-based analytic approach to symbol level recognition for degraded Bengali printed documents


      More Details Abstract Fulltext PDF

      Analysis of degraded printed documents has been a research topic for last several years. In this article the contribution lies in segmentation of word images into symbols and recognition of the symbols of degraded printed document images of Bengali, the 7th most popular language in the world. A novel approach to symbol level segmentation based on a Multilayer Perceptron (MLP) network is proposed. A database of segmenting and non-segmenting image columns is developed from the ISIDDI page level database and segmentation is treated as a two-class classification problem. The MLP weights are learnt based on this database using the back propagation algorithm. We have introduced certain new metrics, based on which the F-score of the proposed segmentation algorithm is determined. Our method utilizes information that is relevant for charactersegmentation, ignoring other highly variable information contained in a printed text document, thus allowing for efficient transfer learning between datasets and alleviating the need for labelled training data. Other than Bengali, we have tested on English, Tamil and Devnagari scripts. For the classification purpose we haveidentified 336 symbols, and the corresponding training and test sets have been developed. The ISIDDI database is used for this purpose. Two classifiers, one CNN based and the other LSTM based, have been developed for this 336-class problem. The classification accuracies obtained on the test set by the CNN classifier and the LSTM classifier are 86.05% and 88.11%, respectively. The proposed classifiers outperform the existing classifiers for the ISIDDI database.

  • Sadhana | News

    • Editorial Note on Continuous Article Publication

      Posted on July 25, 2019

      Click here for Editorial Note on CAP Mode

© 2021-2022 Indian Academy of Sciences, Bengaluru.