Articles written in Sadhana
Volume 45 All articles Published: 4 June 2020 Article ID 0146
Takri is an Indian regional class of scripts, used in hilly areas of north-west India which include Jammu and Kashmir (J & K), Himachal Pradesh (H.P.), Punjab and Uttarakhand. This script has immense variations; almost 13 identified in the whole region of North-west India. It has been observed that no work for text identification and recognition of Takri script has been done so far. Therefore, our work focuses on identifying and classifying the various challenges in the script based on comparative analysis of existing text segmentation approaches, as correct segmentation of text leads to more accurate machine recognition. As there were no metal fonts available for the script, it is required to collect the machine-printed form of data for solving the text identification problem in Takri script. The paper surveys for different text segmentation approaches andbased on the structural properties of the script, shows an implementation of these on Takri data in three steps-Gurmukhi segmentation technique, Connected Component segmentation approach, and Gurmukhi touching characters segmentation approach. Results are analyzed for Segmentation Accuracy and Challenges are identified along with their statistical analysis. Further, the challenges identified as half- forms, numerous types of touching characters, overlapping bounding boxes, are classified. The effectiveness of these challenges was evaluated using Naı¨ve-Bayesian machine learning algorithm. The results showed 80% accuracy in text identification and classification of Takri script.