• Fulltext


        Click here to view fulltext PDF

      Permanent link:

    • Keywords


      Assembly validation; clustering; contigs; de novo assembly; mis-assembly; next generation sequencing; reads

    • Abstract


      With the advent of short-reads-based genome sequencing approaches, large number of organisms are being sequencedall over the world. Most of these assemblies are done using some de novo short read assemblers and other relatedapproaches. However, the contigs produced this way are prone to wrong assembly. So far, there is a conspicuousdearth of reliable tools to identify mis-assembled contigs. Mis-assemblies could result from incorrectly deleted orwrongly arranged genomic sequences. In the present work various factors related to sequence, sequencing andassembling have been assessed for their role in causing mis-assembly by using different genome sequencing data.Finally, some mis-assembly detecting tools have been evaluated for their ability to detect the wrongly assembledprimary contigs, suggesting a lot of scope for improvement in this area. The present work also proposes a simpleunsupervised learning-based novel approach to identify mis-assemblies in the contigs which was found performingreasonably well when compared to the already existing tools to report mis-assembled contigs. It was observed that theproposed methodology may work as a complementary system to the existing tools to enhance their accuracy.

    • Author Affiliations



      1. Studio of Computational Biology & Bioinformatics, Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
      2. Department of Biotechnology, Guru Nanak Dev University, Amritsar, Punjab, India
    • Dates

    • Supplementary Material

  • Journal of Biosciences | News

    • Editorial Note on Continuous Article Publication

      Posted on July 25, 2019

      Click here for Editorial Note on CAP Mode

© 2021-2022 Indian Academy of Sciences, Bengaluru.