INtERAcT: Interaction Network Inference from Vector Representations of   Words

Matteo Manica; Roland Mathis; Mar\'ia Rodr\'iguez Mart\'inez

arXiv:1801.03011·q-bio.MN·November 7, 2019

INtERAcT: Interaction Network Inference from Vector Representations of Words

Matteo Manica, Roland Mathis, Mar\'ia Rodr\'iguez Mart\'inez

PDF

TL;DR

INtERAcT is an unsupervised method that extracts protein-protein interactions from biomedical literature using vector word representations, outperforming existing methods and applicable across various scientific domains without manual curation.

Contribution

The paper introduces INtERAcT, a novel unsupervised approach leveraging vector embeddings to infer molecular interactions from text, eliminating the need for manual annotation or semantic rules.

Findings

01

Outperforms existing similarity metrics in identifying known interactions

02

Successfully reconstructs molecular pathways for 10 cancer types

03

Operates without manual curation or semantic rules

Abstract

In recent years, the number of biomedical publications has steadfastly grown, resulting in a rich source of untapped new knowledge. Most biomedical facts are however not readily available, but buried in the form of unstructured text, and hence their exploitation requires the time-consuming manual curation of published articles. Here we present INtERAcT, a novel approach to extract protein-protein interactions from a corpus of biomedical articles related to a broad range of scientific domains in a completely unsupervised way. INtERAcT exploits vector representation of words, computed on a corpus of domain specific knowledge, and implements a new metric that estimates an interaction score between two molecules in the space where the corresponding words are embedded. We demonstrate the power of INtERAcT by reconstructing the molecular pathways associated to 10 different cancer types using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.