INtERAcT: Interaction Network Inference from Vector Representations of Words
Matteo Manica, Roland Mathis, Mar\'ia Rodr\'iguez Mart\'inez

TL;DR
INtERAcT is an unsupervised method that extracts protein-protein interactions from biomedical literature using vector word representations, outperforming existing methods and applicable across various scientific domains without manual curation.
Contribution
The paper introduces INtERAcT, a novel unsupervised approach leveraging vector embeddings to infer molecular interactions from text, eliminating the need for manual annotation or semantic rules.
Findings
Outperforms existing similarity metrics in identifying known interactions
Successfully reconstructs molecular pathways for 10 cancer types
Operates without manual curation or semantic rules
Abstract
In recent years, the number of biomedical publications has steadfastly grown, resulting in a rich source of untapped new knowledge. Most biomedical facts are however not readily available, but buried in the form of unstructured text, and hence their exploitation requires the time-consuming manual curation of published articles. Here we present INtERAcT, a novel approach to extract protein-protein interactions from a corpus of biomedical articles related to a broad range of scientific domains in a completely unsupervised way. INtERAcT exploits vector representation of words, computed on a corpus of domain specific knowledge, and implements a new metric that estimates an interaction score between two molecules in the space where the corresponding words are embedded. We demonstrate the power of INtERAcT by reconstructing the molecular pathways associated to 10 different cancer types using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
