Why can't memory networks read effectively?
Simon \v{S}uster, Madhumita Sushil, Walter Daelemans

TL;DR
This paper investigates why vanilla memory networks perform poorly in reading comprehension tasks, identifying key issues and proposing simple modifications to improve their effectiveness, while also highlighting the impact of unseen answers on evaluation.
Contribution
The paper reveals the reasons behind the ineffectiveness of vanilla memory networks in single-hop reading comprehension and proposes simple network adaptations as remedies.
Findings
Output classification layer with entity-specific weights hampers performance.
Flat attention distributions lead to poor passage information aggregation.
Unseen answers at test time significantly affect evaluation results.
Abstract
Memory networks have been a popular choice among neural architectures for machine reading comprehension and question answering. While recent work revealed that memory networks can't truly perform multi-hop reasoning, we show in the present paper that vanilla memory networks are ineffective even in single-hop reading comprehension. We analyze the reasons for this on two cloze-style datasets, one from the medical domain and another including children's fiction. We find that the output classification layer with entity-specific weights, and the aggregation of passage information with relatively flat attention distributions are the most important contributors to poor results. We propose network adaptations that can serve as simple remedies. We also find that the presence of unseen answers at test time can dramatically affect the reported results, so we suggest controlling for this factor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsTest
