• Arun Agarwal

      Articles written in Sadhana

    • Understanding paper documents

      Arun Agarwal

      More Details Abstract Fulltext PDF

      We describe the organization and several components of an automated document processing system that begins with digitized images of documents and produces representations at higher levels. Such representations inlcude: the visual sketch (connected components extracted from the background), physical layout (spatial extents of blocks corresponding to text, graphics), logical layout (grouping of strings into words and phrases), and block primitives (e.g., recognised characters and words in text blocks, recognition of hand-drawn line drawings i.e. schematic electronic circuits). We describe algorithms for deriving several of the representations and describe the interaction of different modules. The methods are illustrated with examples.

    • A string matching based algorithm for performance evaluation of mathematical expression recognition

      P Pavan Kumar Arun Agarwal Chakravarthy Bhagvati

      More Details Abstract Fulltext PDF

      In this paper, we have addressed the problem of automated performance evaluation of Mathematical Expression (ME) recognition. Automated evaluation requires that recognition output and ground truth in some editable format like LaTeX, MathML, etc. have to be matched. But standard forms can have extraneous symbols or tags. For example, <mo> tag is added for an operator in MathML and \begin{array} is used to encoded matrices in LaTeX. These extraneous symbols are also involved in matching that is not intuitive. For that, we have proposed a novel structure encoded string representation that is independent of any editable format. Structure encoded strings retain the structure (spatial relationships like superscript, subscript, etc.) and do not contain any extraneous symbols. As structure encoded strings give the linear representation of MEs, Levenshtein edit distance is used as a measure for performance evaluation. Therefore, in our approach, recognition output and ground truth in LaTeX form are converted to their corresponding structure encoded strings and Levenshtein edit distance is computed between them.

  • Sadhana | News

    • Editorial Note on Continuous Article Publication

      Posted on July 25, 2019

      Click here for Editorial Note on CAP Mode

© 2021-2022 Indian Academy of Sciences, Bengaluru.