TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval

Megan Leszczynski; Daniel Y. Fu; Mayee F. Chen; Christopher R\'e

arXiv:2204.08173·cs.CL·April 19, 2022

TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval

Megan Leszczynski, Daniel Y. Fu, Mayee F. Chen, Christopher R\'e

PDF

Open Access 1 Repo

TL;DR

TABi introduces a type-aware bi-encoder approach that improves open-domain entity retrieval, especially for rare and ambiguous entities, by leveraging knowledge graph types during training without sacrificing overall performance.

Contribution

The paper proposes TABi, a novel joint training method for bi-encoders that incorporates knowledge graph types to enhance entity retrieval in open-domain tasks.

Findings

01

Improves retrieval of rare entities on AmbER datasets.

02

Maintains strong overall performance on KILT benchmark.

03

Robust to incomplete type systems with only 5% type coverage.

Abstract

Entity retrieval--retrieving information about entity mentions in a query--is a key step in open-domain tasks, such as question answering or fact checking. However, state-of-the-art entity retrievers struggle to retrieve rare entities for ambiguous mentions due to biases towards popular entities. Incorporating knowledge graph types during training could help overcome popularity biases, but there are several challenges: (1) existing type-based retrieval methods require mention boundaries as input, but open-domain tasks run on unstructured text, (2) type-based methods should not compromise overall performance, and (3) type-based methods should be robust to noisy and missing types. In this work, we introduce TABi, a method to jointly train bi-encoders on knowledge graph types and unstructured text for entity retrieval for open-domain tasks. TABi leverages a type-enforced contrastive loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hazyresearch/tabi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management