KILT: a Benchmark for Knowledge Intensive Language Tasks

Fabio Petroni; Aleksandra Piktus; Angela Fan; Patrick Lewis; Majid; Yazdani; Nicola De Cao; James Thorne; Yacine Jernite; Vladimir Karpukhin,; Jean Maillard; Vassilis Plachouras; Tim Rockt\"aschel; Sebastian Riedel

arXiv:2009.02252·cs.CL·May 28, 2021

KILT: a Benchmark for Knowledge Intensive Language Tasks

Fabio Petroni, Aleksandra Piktus, Angela Fan, Patrick Lewis, Majid, Yazdani, Nicola De Cao, James Thorne, Yacine Jernite, Vladimir Karpukhin,, Jean Maillard, Vassilis Plachouras, Tim Rockt\"aschel, Sebastian Riedel

PDF

3 Repos 1 Models 2 Datasets

TL;DR

KILT introduces a unified benchmark for knowledge-intensive language tasks using a shared Wikipedia snapshot, enabling fair comparison of models and fostering development of general, memory-augmented NLP systems.

Contribution

The paper presents KILT, a comprehensive benchmark for multiple knowledge-intensive tasks based on a common data source, facilitating research on general models and memory architectures.

Findings

01

Shared dense vector index with seq2seq models performs well across tasks.

02

Models can effectively provide provenance information.

03

Competitive results achieved on multiple knowledge tasks.

Abstract

Challenging problems such as open-domain question answering, fact checking, slot filling and entity linking require access to large, external knowledge sources. While some models do well on individual tasks, developing general models is difficult as each task might require computationally expensive indexing of custom knowledge sources, in addition to dedicated infrastructure. To catalyze research on models that condition on specific information in large textual resources, we present a benchmark for knowledge-intensive language tasks (KILT). All tasks in KILT are grounded in the same snapshot of Wikipedia, reducing engineering turnaround through the re-use of components, as well as accelerating research into task-agnostic memory architectures. We test both task-specific and general baselines, evaluating downstream performance in addition to the ability of the models to provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
facebook/genre-kilt
model· 697 dl· ♡ 14
697 dl♡ 14

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence