TL;DR
This paper introduces a distantly-supervised approach to jointly learn embeddings for entities and text from unannotated corpora, reducing reliance on costly structured resources and improving entity similarity and relatedness representations.
Contribution
The authors propose a novel method for jointly embedding entities and text using only entity-surface form mappings, applicable across domains without needing structured knowledge bases.
Findings
Embeddings outperform prior methods in biomedical datasets.
New Wikipedia-based dataset demonstrates improved entity similarity.
Entities and words encode complementary information for NLP tasks.
Abstract
Learning representations for knowledge base entities and concepts is becoming increasingly important for NLP applications. However, recent entity embedding methods have relied on structured resources that are expensive to create for new domains and corpora. We present a distantly-supervised method for jointly learning embeddings of entities and text from an unnanotated corpus, using only a list of mappings between entities and surface forms. We learn embeddings from open-domain and biomedical corpora, and compare against prior methods that rely on human-annotated text or large knowledge graph structure. Our embeddings capture entity similarity and relatedness better than prior work, both in existing biomedical datasets and a new Wikipedia-based dataset that we release to the community. Results on analogy completion and entity sense disambiguation indicate that entities and words capture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
