      Russiachand Heikham Ravi Shankar

      The non-coding elements of a genome, with many of them considered as junk earlier, have now started gaining long due respectability, with microRNAs as the best current example. MicroRNAs bind preferentially to the 3′ untranslated regions (UTRs) of the target genes and negatively regulate their expression most of the time. Several microRNA:target prediction softwares have been developed based upon various assumptions and the majority of them consider the free energy of binding of a target to its microRNA and seed conservation. However, the average concordance between the predictions made by these softwares is limited and compounded by a large number of false-positive results. In this study, we describe a methodology developed by us to refine microRNA:target prediction by target prediction softwares through observations made from a comprehensive study. We incorporated the information obtained from dinucleotide content variation patterns recorded for flanking regions around the target sites using support vector machines (SVMs) trained over two different major sources of experimental data, besides other sources. We assessed the performance of our methodology with rigorous tests over four different dataset models and also compared it with a recently published refinement tool, MirTif. Our methodology attained a higher average accuracy of 0.88, average sensitivity and specificity of 0.81 and 0.94, respectively, and areas under the curves (AUCs) for all the four models scored above 0.9, suggesting better performance by our methodology and a possible role of flanking regions in microRNA targeting control. We used our methodology over genes of three different pathways – toll-like receptor (TLR), apoptosis and insulin – to finally predict the most probable targets. We also investigated their possible regulatory associations, and identified a hsa-miR-23a regulatory module.

      Ashwani Jha Mrigaya Mehra Ravi Shankar

      miRNAs are small non-coding RNAs with average length of ∼21 bp. miRNA formation seems to be dependent upon multiple factors besides Drosha and Dicer, in a tissue/stage-specific manner, with interplay of several specific binding factors. In the present study, we have investigated transcription factor binding sites in and around the genomic sequences of precursor miRNAs and RNA-binding protein (RBP) sites in miRNA precursor sequences, analysed and tested in comprehensive manner. Here, we report that miRNA precursor regions are positionally enriched for binding of transcription factors as well as RBPs around the 3′ end of mature miRNA region in 5′ arm. The pattern and distribution of such regulatory sites appears to be a characteristic of precursor miRNA sequences when compared with non-miRNA sequences as negative dataset and tested statistically. When compared with 1 kb upstreamregions, a sudden sharp peak for binding sites arises in the enriched zone near the mature miRNA region. An expression-data-based correlation analysis was performed between such miRNAs and their corresponding transcription factors and RBPs for this region. Some specific groups of binding factors and associated miRNAs were identified. We also identified some of the overrepresented transcription factors and associated miRNAs with high expression correlation values which could be useful in cancer-related studies. The highly correlated groups were found to host experimentally validated composite regulatory modules, in which Lmo2-GATA1 appeared as the predominant one. For many of RBP–miRNAs associations, co-expression similarity was also evident among the associated miRNA common to given RBPs, supporting the Regulon model, suggesting a common role and common control of these miRNAs by the associated RBPs. Based on our findings, we propose that the observed characteristic distribution of regulatory sites in precursor miRNA sequence regions could be critical inmiRNA transcription, processing, stability and formation and are important for therapeutic studies. Our findings also support the recently proposed theory of self-sufficient mode of transcription by miRNAs, which states that miRNA transcription can be carried out in host-independent mode too.

      Ashwani Jha Ravi Shankar

      DNA methylation is a type of epigenetic modification where a methyl group is added to the cytosine or adenine residue of a given DNA sequence. It has been observed that DNA methylation is achieved by some collaborative agglomeration of certain proteins and non-coding RNAs. The assembly of IDN2 and its homologous proteins with siRNAs recruits the enzyme DRM2, which adds a methyl group at certain cytosine residues within the DNA sequence. In this study, it was found that de novo DNA methylation might be regulated by miRNAs through systematic targeting of the genes involved in DNA methylation. A comprehensive genome-wide and system-level study of miRNA targeting, transcription factors, DNA-methylation-causing genes and their target genes has provided a clear picture of an interconnected relationship of all these factors which regulate DNA methylation in Arabidopsis. The study has identified a DNA methylation system that is controlled by four different genes: IDN2, IDNl1, IDNl2 and DRM2. These four genes along with various critical transcription factors appear to be controlled by five different miRNAs. Altogether, DNA methylation appears to be a finely tuned process of opposite control systems of DNA-methylation-causing genes and certain miRNAs pitted against each other.

      BL Manjunatha HR Singh G Ravikanth Karaba N Nataraja Ravi Shankar Sanjay Kumar R Uma Shaanker

      Camptothecin (CPT), a monoterpene indole alkaloid, is a potent inhibitor of DNA topoisomerase I and has applications in treating ovarian, small lung and refractory ovarian cancers. Stem wood tissue of Nothapodytes nimmoniana (Graham) Mabb. (family Icacinaceae) is one of the richest sources of CPT. Since there is no genomic or transcriptome data available for the species, the present work sequenced and analysed transcriptome of stem wood tissue on an Illumina platform. From a total of 77,55,978 reads, 9,187 transcripts were assembled with an average length of 255 bp. Functional annotation and categorization of these assembled transcripts unraveled the transcriptome architecture and also a total of 13 genes associated with CPT biosynthetic pathway were identified in the stem woodtissue. Four genes of the pathway were cloned to full length by RACE to validate the transcriptome data. Expression analysis of 13 genes associated with CPT biosynthetic pathway in 11 different tissues vis-a-vis CPT content analysis suggested an important role of NnPG10H, NnPSLS and NnPSTR genes in the biosynthesis of CPT. These results indicated that CPT might be synthesized in the leaves and then perhaps exported to stem wood tissue for storage.

      With the advent of short-reads-based genome sequencing approaches, large number of organisms are being sequencedall over the world. Most of these assemblies are done using some de novo short read assemblers and other relatedapproaches. However, the contigs produced this way are prone to wrong assembly. So far, there is a conspicuousdearth of reliable tools to identify mis-assembled contigs. Mis-assemblies could result from incorrectly deleted orwrongly arranged genomic sequences. In the present work various factors related to sequence, sequencing andassembling have been assessed for their role in causing mis-assembly by using different genome sequencing data.Finally, some mis-assembly detecting tools have been evaluated for their ability to detect the wrongly assembledprimary contigs, suggesting a lot of scope for improvement in this area. The present work also proposes a simpleunsupervised learning-based novel approach to identify mis-assemblies in the contigs which was found performingreasonably well when compared to the already existing tools to report mis-assembled contigs. It was observed that theproposed methodology may work as a complementary system to the existing tools to enhance their accuracy.

