Effectively integrating information content and structural relationship to improve the GO-based similarity measure between proteins
Bo Li, James Z. Wang, F. Alex Feltus, Jizhong Zhou, Feng Luo

TL;DR
This paper introduces SimIC, a new GO-based protein similarity measure that combines information content and structural hierarchy, improving protein similarity assessments and interaction predictions.
Contribution
The paper presents a novel similarity measure, SimIC, which effectively integrates GO information content and hierarchy to enhance protein similarity and interaction prediction.
Findings
SimIC improves correlation with protein expression and sequence similarities.
SimIC outperforms existing methods in predicting yeast protein interactions.
All members of 159 MIPS complexes are identified in predicted PPIs.
Abstract
The Gene Ontology (GO) provides a knowledge base to effectively describe proteins. However, measuring similarity between proteins based on GO remains a challenge. In this paper, we propose a new similarity measure, information coefficient similarity measure (SimIC), to effectively integrate both the information content (IC) of GO terms and the structural information of GO hierarchy to determine the similarity between proteins. Testing on yeast proteins, our results show that SimIC efficiently addresses the shallow annotation issue in GO, thus improves the correlations between GO similarities of yeast proteins and their expression similarities as well as between GO similarities of yeast proteins and their sequence similarities. Furthermore, we demonstrate that the proposed SimIC is superior in predicting yeast protein interactions. We predict 20484 yeast protein-protein interactions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Machine Learning in Bioinformatics · Computational Drug Discovery Methods
