FabricQA-Extractor: A Question Answering System to Extract Information   from Documents using Natural Language Questions

Qiming Wang; Raul Castro Fernandez

arXiv:2408.09226·cs.IR·August 20, 2024

FabricQA-Extractor: A Question Answering System to Extract Information from Documents using Natural Language Questions

Qiming Wang, Raul Castro Fernandez

PDF

Open Access

TL;DR

FabricQA-Extractor is a question answering system that leverages relational structure knowledge to improve large-scale information extraction from unstructured documents.

Contribution

The paper introduces Relation Coherence, a novel model that exploits relational structure, integrated into FabricQA-Extractor for enhanced extraction performance.

Findings

01

Relation Coherence improves extraction accuracy.

02

FabricQA-Extractor performs well on large-scale datasets.

03

System efficiently handles millions of documents.

Abstract

Reading comprehension models answer questions posed in natural language when provided with a short passage of text. They present an opportunity to address a long-standing challenge in data management: the extraction of structured data from unstructured text. Consequently, several approaches are using these models to perform information extraction. However, these modern approaches leave an opportunity behind because they do not exploit the relational structure of the target extraction table. In this paper, we introduce a new model, Relation Coherence, that exploits knowledge of the relational structure to improve the extraction quality. We incorporate the Relation Coherence model as part of FabricQA-Extractor, an end-to-end system we built from scratch to conduct large scale extraction tasks over millions of documents. We demonstrate on two datasets with millions of passages that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques