• Fulltext


        Click here to view fulltext PDF

      Permanent link:

    • Keywords


      Canonical correlation analysis; DNA sequence; pattern recognition

    • Abstract


      We performed canonical correlation analysis as an unsupervised statistical tool to describe related views of the same semantic object for identifying patterns. A pattern recognition technique based on canonical correlation analysis (CCA) was proposed for finding required genetic code in the DNA sequence. Two related but different objects were considered: one was a particular pattern, and other was test DNA sequence. CCA found correlations between two observations of the same semantic pattern and test sequence. It is concluded that the relationship possesses maximum value in the position where the pattern exists. As a case study, the potential of CCA was demonstrated on the sequence found from HIV-1 preferred integration sites. The subsequences on the left and right flanking from the integration site were considered as the two views, and statistically significant relationships were established between these two views to elucidate the viral preference as an important factor for the correlation.

    • Author Affiliations


      B K Sarkar1 Chiranjib Chakraborty2

      1. Department of Physics, School of Basic & Applied Sciences, Galgotias University, Greater Noida, India
      2. Department of Bioinformatics, School of Computer Sciences, Galgotias University, Greater Noida, India
    • Dates

  • Journal of Biosciences | News

    • Editorial Note on Continuous Article Publication

      Posted on July 25, 2019

      Click here for Editorial Note on CAP Mode

© 2021-2022 Indian Academy of Sciences, Bengaluru.