pp 1-2 January 2007
pp 3-15 January 2007 Articles
Simple sequence repeats (SSRs) or microsatellites are the repetitive nucleotide sequences of motifs of length 1–6 bp. They are scattered throughout the genomes of all the known organisms ranging from viruses to eukaryotes. Microsatellites undergo mutations in the form of insertions and deletions (INDELS) of their repeat units with some bias towards insertions that lead to microsatellite tract expansion. Although prokaryotic genomes derive some plasticity due to microsatellite mutations they have in-built mechanisms to arrest undue expansions of microsatellites and one such mechanism is constituted by post-replicative DNA repair enzymes MutL, MutH and MutS. The mycobacterial genomes lack these enzymes and as a null hypothesis one could expect these genomes to harbour many long tracts. It is therefore interesting to analyse the mycobacterial genomes for distribution and abundance of microsatellites tracts and to look for potentially polymorphic microsatellites. Available mycobacterial genomes, Mycobacterium avium, M. leprae, M. bovis and the two strains of M. tuberculosis (CDC1551 and H37Rv) were analysed for frequencies and abundance of SSRs. Our analysis revealed that the SSRs are distributed throughout the mycobacterial genomes at an average of 220–230 SSR tracts per kb. All the mycobacterial genomes contain few regions that are conspicuously denser or poorer in microsatellites compared to their expected genome averages. The genomes distinctly show scarcity of long microsatellites despite the absence of a post-replicative DNA repair system. Such severe scarcity of long microsatellites could arise as a result of strong selection pressures operating against long and unstable sequences although influence of GC-content and role of point mutations in arresting microsatellite expansions can not be ruled out. Nonetheless, the long tracts occasionally found in coding as well as non-coding regions may account for limited genome plasticity in these genomes.
pp 17-29 January 2007 Articles
The sequence motifs present in the replication initiator protein (Rep) of geminiviruses have been compared with those present in all known rolling circle replication initiators. The predicted secondary structures of Rep representing each group of organisms have been compared and found to be conserved. Regions of recombination in the Rep gene and the adjoining 5′ intergenic region (IR) of representative species of Geminiviridae have been identified using Recombination Detection Programs. The possible implications of such recombinations on the increasing host range of geminivirus infections are discussed.
pp 31-42 January 2007 Articles
In the present study, a systematic attempt has been made to develop an accurate method for predicting MHC class I restricted T cell epitopes for a large number of MHC class I alleles. Initially, a quantitative matrix (QM)-based method was developed for 47 MHC class I alleles having at least 15 binders. A secondary artificial neural network (ANN)-based method was developed for 30 out of 47 MHC alleles having a minimum of 40 binders. Combination of these ANN- and QM-based prediction methods for 30 alleles improved the accuracy of prediction by 6% compared to each individual method. Average accuracy of hybrid method for 30 MHC alleles is 92.8%. This method also allows prediction of binders for 20 additional alleles using QM that has been reported in the literature, thus allowing prediction for 67 MHC class I alleles. The performance of the method was evaluated using jack-knife validation test. The performance of the methods was also evaluated on blind or independent data. Comparison of our method with existing MHC binder prediction methods for alleles studied by both methods shows that our method is superior to other existing methods. This method also identifies proteasomal cleavage sites in antigen sequences by implementing the matrices described earlier. Thus, the method that we discover allows the identification of MHC class I binders (peptides binding with many MHC alleles) having proteasomal cleavage site at C-terminus. The user-friendly result display format (HTML-II) can assist in locating the promiscuous MHC binding regions from antigen sequence. The method is available on the web at www.imtech.res.in/raghava/nhlapred and its mirror site is available at http://bioinformatics.uams.edu/mirror/nhlapred/.
pp 43-50 January 2007 Articles
Nuclear hormone receptors (NRs) form a large superfamily of ligand-activated transcription factors, which regulate genes underlying a wide range of (patho) physiological phenomena. Availability of the full genome sequence of Tetraodon nigroviridis facilitated a genome wide analysis of the NRs in fish genome. Seventy one NRs were found in Tetraodon and were compared with mammalian and fish NR family members. In general, there is a higher representation of NRs in fish genomes compared to mammalian ones. They showed high diversity across classes as observed by phylogenetic analysis. Nucleotide substitution rates show strong negative selection among fish NRs except for pregnane X receptor (PXR), estrogen receptor (ER) and liver X receptor (LXR). This may be attributed to crucial role played by them in metabolism and detoxification of xenobiotic and endobiotic compounds and might have resulted in slight positive selection. Chromosomal mapping and pairwise comparisons of NR distribution in Tetraodon and humans led to the identification of nine syntenic NR regions, of which three are common among fully sequenced vertebrate genomes. Gene structure analysis shows strong conservation of exon structures among orthologoues. Whereas paralogous members show different splicing patterns with intron gain or loss and addition or substitution of exons played a major role in evolution of NR superfamily.
pp 51-70 January 2007 Articles
The description of protein 3D structures can be performed through a library of 3D fragments, named a structural alphabet. Our structural alphabet is composed of 16 small protein fragments of 5 C𝛼 in length, called protein blocks (PBs). It allows an efficient approximation of the 3D protein structures and a correct prediction of the local structure. The 72 most frequent series of 5 consecutive PBs, called structural words (SWs) are able to cover more than 90% of the 3D structures. PBs are highly conditioned by the presence of a limited number of transitions between them. In this study, we propose a new method called “pinning strategy” that used this specific feature to predict long protein fragments. Its goal is to define highly probable successions of PBs. It starts from the most probable SW and is then extended with overlapping SWs. Starting from an initial prediction rate of 34.4%, the use of the SWs instead of the PBs allows a gain of 4.5%. The pinning strategy simply applied to the SWs increases the prediction accuracy to 39.9%. In a second step, the sequence-structure relationship is optimized, the prediction accuracy reaches 43.6%.
pp 71-81 January 2007 Articles
Automated protein tertiary structure prediction from sequence information alone remains an elusive goal to computational prescriptions. Dividing the problem into three stages viz. secondary structure prediction, generation of plausible main chain loop dihedrals and side chain dihedral optimization, considerable progress has been achieved in our laboratory (http://www.scfbio-iitd.res.in/bhageerath/index.jsp) and elsewhere for proteins with less than 100 amino acids. As a part of our on-going efforts in this direction and to facilitate tertiary structure selection/rejection in containing the combinatorial explosion of trial structures for a specified amino acid sequence, we describe here a web-enabled tool ProRegIn (Protein Regularity Index) developed based on the regularity in the 𝛷, Ψ dihedral angles of the amino acids that constitute loop regions. We have analysed the dihedrals in loop regions in a non-redundant dataset of 7351 proteins drawn from the Protein Data Bank and categorized them as helix-like or sheet-like (regular) or irregular. We noticed that the regularity thus defined exceeds 86% for 𝛷 barring glycine and 70% for Ψ for all the amino acid side chains including glycine, compelling us to reexamine the conventional view that loops are irregular regions structurally. The regularity index is presented here as a simple tool that finds its application in protein structure analysis as a discriminatory scoring function for rapid screening before the more compute intensive atomic level energy calculations could be undertaken. The tool is made freely accessible over the internet at www.scfbio-iitd.res.in/software/proregin.jsp.
pp 83-96 January 2007 Articles
Several studies based on the known three-dimensional (3-D) structures of proteins show that two homologous proteins with insignificant sequence similarity could adopt a common fold and may perform same or similar biochemical functions. Hence, it is appropriate to use similarities in 3-D structure of proteins rather than the amino acid sequence similarities in modelling evolution of distantly related proteins. Here we present an assessment of using 3-D structures in modelling evolution of homologous proteins. Using a dataset of 108 protein domain families of known structures with at least 10 members per family we present a comparison of extent of structural and sequence dissimilarities among pairs of proteins which are inputs into the construction of phylogenetic trees. We find that correlation between the structure-based dissimilarity measures and the sequence-based dissimilarity measures is usually good if the sequence similarity among the homologues is about 30% or more. For protein families with low sequence similarity among the members, the correlation coefficient between the sequence-based and the structure-based dissimilarities are poor. In these cases the structure-based dendrogram clusters proteins with most similar biochemical functional properties better than the sequence-similarity based dendrogram. In multi-domain protein families and disulphide-rich protein families the correlation coefficient for the match of sequence-based and structure-based dissimilarity (SDM) measures can be poor though the sequence identity could be higher than 30%. Hence it is suggested that protein evolution is best modelled using 3-D structures if the sequence similarities (SSM) of the homologues are very low.
pp 97-100 January 2007 Articles
Large-scale genome sequencing and structural genomics projects generate numerous sequences and structures for ‘hypothetical’ proteins without functional characterizations. Detection of homology to experimentally characterized proteins can provide functional clues, but the accuracy of homology-based predictions is limited by the paucity of tools for quantitative comparison of diverging residues responsible for the functional divergence. SURF’S UP! is a web server for analysis of functional relationships in protein families, as inferred from protein surface maps comparison according to the algorithm. It assigns a numerical score to the similarity between patterns of physicochemical features (charge, hydrophobicity) on compared protein surfaces. It allows recognizing clusters of proteins that have similar surfaces, hence presumably similar functions. The server takes as an input a set of protein coordinates and returns files with ``spherical coordinates” of proteins in a PDB format and their graphical presentation, a matrix with values of mutual similarities between the surfaces, and the unrooted tree that represents the clustering of similar surfaces, calculated by the neighbor-joining method. SURF’S UP! facilitates the comparative analysis of physicochemical features of the surface, which are the key determinants of the protein function. By concentrating on coarse surface features, SURF’S UP! can work with models obtained from comparative modelling. Although it is designed to analyse the conservation among homologs, it can also be used to compare surfaces of non-homologous proteins with different three-dimensional folds, as long as a functionally meaningful structural superposition is supplied by the user. Another valuable characteristic of our method is the lack of initial assumptions about the functional features to be compared. SURF’S UP! is freely available for academic researchers at http://asia.genesilico.pl/surfs_up/.
pp 101-111 January 2007 Articles
An important component of functional genomics involves the understanding of protein association. The interfaces resulting from protein-protein interactions –
specific, as represented by the homodimeric quaternary structures and the complexes formed by two independently occurring protein components, and
non-specific, as observed in the crystal lattice of monomeric proteins – have been analysed on the basis of the length and the number of peptide segments.
In 1000 Å2 of the interface area, contributed by a polypeptide chain, there would be 3.4 segments in homodimers, 5.6 in complexes and 6.3 in crystal contacts. Concomitantly, the segments are the longest (with 8.7 interface residues) in homodimers. Core segments (likely to contribute more towards binding) are more in number in homodimers (1.7) than in crystal contacts (0.5), and this number can be used as one of the parameters to distinguish between the two types of interfaces. Dominant segments involved in specific interactions, along with their secondary structural features, are enumerated.
pp 113-127 January 2007 Articles
This paper first presents basic Petri net components representing molecular interactions and mechanisms of signalling pathways, and introduces a method to construct a Petri net model of a signalling pathway with these components. Then a simulation method of determining the delay time of transitions, by using timed Petri nets – i.e. the time taken in firing of each transition – is proposed based on some simple principles that the number of tokens flowed into a place is equivalent to the number of tokens flowed out. Finally, the availability of proposed method is confirmed by observing signalling transductions in biological pathways through simulation experiments of the apoptosis signalling pathways as an example.
pp 129-144 January 2007 Articles
Complex processes resulting from interaction of multiple elements can rarely be understood by analytical scientific approaches alone; additional, mathematical models of system dynamics are required. This insight, which disciplines like physics have embraced for a long time already, is gradually gaining importance in the study of cognitive processes by functional neuroimaging. In this field, causal mechanisms in neural systems are described in terms of effective connectivity. Recently, dynamic causal modelling (DCM) was introduced as a generic method to estimate effective connectivity from neuroimaging data in a Bayesian fashion. One of the key advantages of DCM over previous methods is that it distinguishes between neural state equations and modality-specific forward models that translate neural activity into a measured signal. Another strength is its natural relation to Bayesian model selection (BMS) procedures. In this article, we review the conceptual and mathematical basis of DCM and its implementation for functional magnetic resonance imaging data and event-related potentials. After introducing the application of BMS in the context of DCM, we conclude with an outlook to future extensions of DCM. These extensions are guided by the long-term goal of using dynamic system models for pharmacological and clinical applications, particularly with regard to synaptic plasticity.
pp 145-155 January 2007 Articles
Many databases propose their own structure and format to provide data describing biological processes. This heterogeneity contributes to the difficulty of large systematic and automatic functional comparisons. To overcome these problems, we have used the BioΨ formal description scheme which allows multi-level representations of biological process information. Applied to the description of the tricarboxylic acid cycle (TCA), we show that BioΨ allows the formal integration of functional information existing in current databases and make them available for further automated analysis. In addition such a formal TCA cycle process description leads to a more accurate biological process annotation which takes in account the biological context. This enables us to perform an automated comparison of the TCA cycles for seven different species based on processes rather than protein sequences. From current databases, BioΨ is able to unravel information that are already known by the biologists but are not available for automated analysis tools and simulation software, because of the lack of formal process descriptions. This use of the BioΨ description scheme to describe the TCA cycle was a key step of the MitoScop project that aims to describe and simulate mitochondrial metabolism in silico.
pp 157-167 January 2007 Articles
Biological phenomena at the cellular level can be represented by various types of mathematical formulations. Such representations allow us to carry out numerical simulations that provide mechanistic insights into complex behaviours of biological systems and also generate hypotheses that can be experimentally tested. Currently, we are particularly interested in spatio-temporal representations of dynamic cellular phenomena and how such models can be used to understand biological specificity in functional responses. This review describes the capability and limitations of the approaches used to study spatio-temporal dynamics of cell signalling components.
pp 169-180 January 2007 Articles
Bioinformatics has delivered great contributions to genome and genomics research, without which the world-wide success of this and other global (‘omics’) approaches would not have been possible. More recently, it has developed further towards the analysis of different kinds of networks thus laying the foundation for comprehensive description, analysis and manipulation of whole living systems in modern ``systems biology”. The next step which is necessary for developing a systems biology that deals with systemic phenomena is to expand the existing and develop new methodologies that are appropriate to characterize intercellular processes and interactions without omitting the causal underlying molecular mechanisms. Modelling the processes on the different levels of complexity involved requires a comprehensive integration of information on gene regulatory events, signal transduction pathways, protein interaction and metabolic networks as well as cellular functions in the respective tissues/organs.