TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval
Megan Leszczynski, Daniel Y. Fu, Mayee F. Chen, Christopher R\'e

TL;DR
TABi introduces a type-aware bi-encoder approach that improves open-domain entity retrieval, especially for rare and ambiguous entities, by leveraging knowledge graph types during training without sacrificing overall performance.
Contribution
The paper proposes TABi, a novel joint training method for bi-encoders that incorporates knowledge graph types to enhance entity retrieval in open-domain tasks.
Findings
Improves retrieval of rare entities on AmbER datasets.
Maintains strong overall performance on KILT benchmark.
Robust to incomplete type systems with only 5% type coverage.
Abstract
Entity retrieval--retrieving information about entity mentions in a query--is a key step in open-domain tasks, such as question answering or fact checking. However, state-of-the-art entity retrievers struggle to retrieve rare entities for ambiguous mentions due to biases towards popular entities. Incorporating knowledge graph types during training could help overcome popularity biases, but there are several challenges: (1) existing type-based retrieval methods require mention boundaries as input, but open-domain tasks run on unstructured text, (2) type-based methods should not compromise overall performance, and (3) type-based methods should be robust to noisy and missing types. In this work, we introduce TABi, a method to jointly train bi-encoders on knowledge graph types and unstructured text for entity retrieval for open-domain tasks. TABi leverages a type-enforced contrastive loss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
