TL;DR
This paper introduces an attention-over-attention neural network for Cloze-style reading comprehension, which improves performance by adding a second attention layer over document attention, outperforming previous models on benchmark datasets.
Contribution
The novel attention-over-attention mechanism enhances reading comprehension models with fewer hyper-parameters and a more elegant architecture, leading to significant performance gains.
Findings
Outperforms state-of-the-art models on CNN dataset
Achieves superior results on Children's Book Test dataset
Requires fewer hyper-parameters than previous models
Abstract
Cloze-style queries are representative problems in reading comprehension. Over the past few months, we have seen much progress that utilizing neural network approach to solve Cloze-style questions. In this paper, we present a novel model called attention-over-attention reader for the Cloze-style reading comprehension task. Our model aims to place another attention mechanism over the document-level attention, and induces "attended attention" for final predictions. Unlike the previous works, our neural network model requires less pre-defined hyper-parameters and uses an elegant architecture for modeling. Experimental results show that the proposed attention-over-attention model significantly outperforms various state-of-the-art systems by a large margin in public datasets, such as CNN and Children's Book Test datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
