Tandemly repeated structural motifs in proteins form highly stable structural folds and provide multiplebinding sites associated with diverse functional roles. The tertiary structure and function of these proteins aredetermined by the type and copy number of the repeating units. Each repeat type exhibits a unique pattern ofintra- and inter-repeat unit interactions that is well-captured by the topological features in the network representationof protein structures. Here we present an improved version of our graph based algorithm, PRIGSA,with structure-based validation and filtering steps incorporated for accurate detection of tandem structuralrepeats. The algorithm integrates available knowledge on repeat families with de novo prediction to detectrepeats in single monomer chains as well as in multimeric protein complexes. Three levels of performanceevaluation are presented: comparison with state-of-the-art algorithms on benchmark dataset of repeat and nonrepeatproteins, accuracy in the detection of members of 13 known repeat families reported in UniProt andexecution on the complete Protein Data Bank to show its ability to identify previously uncharacterizedproteins. A ~3-fold increase in the coverage of the members of 13 known families and 3408 noveluncharacterized structural repeat proteins are identified on executing it on PDB. PRIGSA2 is available at http://bioinf.iiit.ac.in/PRIGSA2/.
Volume 46, 2020
Continuous Article Publishing mode
Click here for Editorial Note on CAP Mode