We have analysed the genomes of representatives of three kingdoms of life, namely, archaea, eubacteria and eukaryota using data mining tools based on compositional analyses of the protein sequences. The representatives chosen in this analysis wereMethanococcus jannaschii, Haemophilus influenzae andSaccharomyces cerevisiae. We have identified the common and different features between the three genomes in the protein evolution patterns.M. jannaschii has been seen to have a greater number of proteins with more charged amino acids whereasS. cerevisiae has been observed to have a greater number of hydrophilic proteins. Despite the differences in intrinsic compositional characteristics between the proteins from the different genomes we have also identified certain common characteristics. We have carried out exploratory Principal Component Analysis of the multivariate data on the proteins of each organism in an effort to classify the proteins into clusters. Interestingly, we found that most of the proteins in each organism cluster closely together, but there are a few ‘outliers’. We focus on the outliers for the functional investigations, which may aid in revealing any unique features of the biology of the respective organisms.
Volume 45, 2020
Continuous Article Publishing mode
Click here for Editorial Note on CAP Mode