Reading Wikipedia to Answer Open-Domain Questions

Danqi Chen; Adam Fisch; Jason Weston; Antoine Bordes

arXiv:1704.00051·cs.CL·May 1, 2017·90 cites

Reading Wikipedia to Answer Open-Domain Questions

Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes

PDF

Open Access 5 Repos 1 Datasets

TL;DR

This paper presents a system for open-domain question answering using Wikipedia as the sole knowledge source, combining document retrieval and machine comprehension with neural networks, achieving competitive results.

Contribution

It introduces a combined retrieval and comprehension approach using bigram hashing, TF-IDF, and neural networks, demonstrating effective multitask learning for open-domain QA.

Findings

01

Both retrieval and comprehension modules are highly competitive.

02

Multitask learning improves overall system performance.

03

The approach is effective across multiple QA datasets.

Abstract

This paper proposes to tackle open- domain question answering using Wikipedia as the unique knowledge source: the answer to any factoid question is a text span in a Wikipedia article. This task of machine reading at scale combines the challenges of document retrieval (finding the relevant articles) with that of machine comprehension of text (identifying the answer spans from those articles). Our approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs. Our experiments on multiple existing QA datasets indicate that (1) both modules are highly competitive with respect to existing counterparts and (2) multitask learning using distant supervision on their combination is an effective complete system on this challenging task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

linagora/linto-dataset-text-ar-tn
dataset· 118 dl
118 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques