FusionNet: Fusing via Fully-Aware Attention with Application to Machine Comprehension
Hsin-Yuan Huang, Chenguang Zhu, Yelong Shen, Weizhu Chen

TL;DR
FusionNet introduces a fully-aware multi-level attention mechanism that enhances neural models for machine comprehension, achieving state-of-the-art results on SQuAD and adversarial datasets by effectively capturing hierarchical attention information.
Contribution
It proposes a novel 'history of word' concept, an improved attention scoring function, and a fully-aware multi-level attention mechanism for better text understanding.
Findings
Achieved first position on SQuAD leaderboard for single and ensemble models.
Set new state-of-the-art results on adversarial SQuAD datasets AddSent and AddOneSent.
Significantly improved F1 scores on challenging datasets.
Abstract
This paper introduces a new neural structure called FusionNet, which extends existing attention approaches from three perspectives. First, it puts forward a novel concept of "history of word" to characterize attention information from the lowest word-level embedding up to the highest semantic-level representation. Second, it introduces an improved attention scoring function that better utilizes the "history of word" concept. Third, it proposes a fully-aware multi-level attention mechanism to capture the complete information in one text (such as a question) and exploit it in its counterpart (such as context or passage) layer by layer. We apply FusionNet to the Stanford Question Answering Dataset (SQuAD) and it achieves the first position for both single and ensemble model on the official SQuAD leaderboard at the time of writing (Oct. 4th, 2017). Meanwhile, we verify the generalization of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
