Articles written in Journal of Biosciences
Volume 32 Issue 5 August 2007 pp 871-881 Articles
Gene and protein sequence analyses, central components of studies in modern biology are easily amenable to string matching and pattern recognition algorithms. The growing need of analysing whole genome sequences more efficiently and thoroughly, has led to the emergence of new computational methods. Suffix trees and suffix arrays are data structures, well known in many other areas and are highly suited for sequence analysis too. Here we report an improvement to the design of construction of suffix arrays. Enhancement in versatility and scalability, enabled by this approach, is demonstrated through the use of real-life examples.
The scalability of the algorithm to whole genomes renders it suitable to address many biologically interesting problems. One example is the evolutionary insight gained by analysing unigrams, bi-grams and higher n-grams, indicating that the genetic code has a direct influence on the overall composition of the genome. Further, different proteomes have been analysed for the coverage of the possible peptide space, which indicate that as much as a quarter of the total space at the tetra-peptide level is left un-sampled in prokaryotic organisms, although almost all tri-peptides can be seen in one protein or another in a proteome. Besides, distinct patterns begin to emerge for the counts of particular tetra and higher peptides, indicative of a ‘meaning’ for tetra and higher n-grams.
The toolkit has also been used to demonstrate the usefulness of identifying repeats in whole proteomes efficiently. As an example, 16 members of one COG, coded by the genome of
Volume 32 Issue 5 August 2007 pp 965-977 Articles
The biological cell, a natural self-contained unit of prime biological importance, is an enormously complex machine that can be understood at many levels. A higher-level perspective of the entire cell requires integration of various features into coherent, biologically meaningful descriptions. There are some efforts to model cells based on their genome, proteome or metabolome descriptions. However, there are no established methods as yet to describe cell morphologies, capture similarities and differences between different cells or between healthy and disease states. Here we report a framework to model various aspects of a cell and integrate knowledge encoded at different levels of abstraction, with cell morphologies at one end to atomic structures at the other. The different issues that have been addressed are ontologies, feature description and model building. The framework describes dotted representations and tree data structures to integrate diverse pieces of data and parametric models enabling size, shape and location descriptions. The framework serves as a first step in integrating different levels of data available for a biological cell and has the potential to lead to development of computational models in our pursuit to model cell structure and function, from which several applications can flow out.