Systematic comparison of the protein-protein interaction databases from a user's perspective

A.K. Bajpai; S. Davuluri; K. Tiwary; S. Narayanan; S. Oguru; K. Basavaraju; D. Dayalan; Kavitha Thirumurugan; K.K. Acharya

doi:10.1016/j.jbi.2020.103380

In absence of periodic systematic comparisons, biologists/bioinformaticians may be forced to make a subjective selection among the many protein–protein interaction (PPI) databases and tools. We conducted a comprehensive compilation and comparison of such resources. We compiled 375 PPI resources, short-listed 125 important ones (both lists are available at startbioinfo.com), and compared the features and coverage of 16 carefully-selected databases related to human PPIs. We quantitatively compared the coverage of ‘experimentally verified’ as well as ‘total’ (experimentally verified and predicted) PPIs for these 16 databases. Coverage was compared in two ways: (a) PPIs obtained in response to gene queries using the web interfaces were compared. As a query set, 108 genes expressed differently across tissues (specific to kidney, testis, and uterus, and ubiquitous - i.e., expressed in 43 human normal tissues) or associated with certain diseases (breast cancer, lung cancer, Alzheimer's, cystic fibrosis, diabetes, and cardiomyopathy) were chosen. The coverage was also compared for the well-studied genes versus the less-studied ones. The coverage of the databases for high-quality interactions was separately assessed using a set of literature curated experimentally-proven PPIs (gold standard PPI-set); (b) the back-end-data from 15 PPI databases was downloaded and compared. Combined results from STRING and UniHI covered around 84% of ‘experimentally verified’ PPIs. Approximately 94% of the ‘total’ PPIs available across the databases were retrieved by the combined use of hPRINT, STRING, and IID. Among the experimentally verified PPIs found exclusively in each database, STRING contributed around 71% of the hits. The coverage of certain databases was skewed for some gene-types. Analysis with the gold-standard PPI-set revealed that GPS-Prot, STRING, APID, and HIPPIE, each covered ~70% of the curated interactions. The database usage frequencies did not always correlate with their respective advantages, thereby justifying the need for more frequent studies of this nature. © 2020 Elsevier Inc.

Journal	Journal of Biomedical Informatics
Publisher	Academic Press Inc.
ISSN	15320464