RoR: Read-over-Read for Long Document Machine Reading Comprehension

Jing Zhao; Junwei Bao; Yifan Wang; Yongwei Zhou; Youzheng Wu; Xiaodong; He; and Bowen Zhou

arXiv:2109.04780·cs.CL·September 15, 2021

RoR: Read-over-Read for Long Document Machine Reading Comprehension

Jing Zhao, Junwei Bao, Yifan Wang, Yongwei Zhou, Youzheng Wu, Xiaodong, He, and Bowen Zhou

PDF

Open Access 1 Repo

TL;DR

RoR introduces a read-over-read approach that enhances long document comprehension by expanding the reading field, combining chunk and document-level reading with answer aggregation, significantly improving performance on benchmarks.

Contribution

The paper proposes RoR, a novel method that enables effective long document reading by integrating chunk-level and document-level comprehension with answer voting.

Findings

01

RoR achieves state-of-the-art results on QuAC and TriviaQA.

02

RoR ranks 1st on the QuAC leaderboard as of May 2021.

03

The method effectively handles long documents beyond traditional length limits.

Abstract

Transformer-based pre-trained models, such as BERT, have achieved remarkable results on machine reading comprehension. However, due to the constraint of encoding length (e.g., 512 WordPiece tokens), a long document is usually split into multiple chunks that are independently read. It results in the reading field being limited to individual chunks without information collaboration for long document machine reading comprehension. To address this problem, we propose RoR, a read-over-read method, which expands the reading field from chunk to document. Specifically, RoR includes a chunk reader and a document reader. The former first predicts a set of regional answers for each chunk, which are then compacted into a highly-condensed version of the original document, guaranteeing to be encoded once. The latter further predicts the global answers from this condensed document. Eventually, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jd-ai-research-nlp/ror
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Softmax · Attention Dropout · Dense Connections · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam