A3Net: Adversarial-and-Attention Network for Machine Reading Comprehension
Jiuniu Wang, Xingyu Fu, Guangluan Xu, Yirong Wu, Ziyan Chen, Yang Wei, and Li Jin

TL;DR
A3Net is a novel machine reading comprehension model that combines adversarial training on multiple target variables with a multi-layer attention mechanism, leading to improved robustness and state-of-the-art performance on WebQA.
Contribution
The paper introduces a3Net, integrating adversarial training on multiple targets and a multi-layer attention network for enhanced comprehension.
Findings
Outperforms state-of-the-art models on WebQA with a Fuzzy Score of 77.0%.
Adversarial training improves model robustness and generalization.
Multi-layer attention enhances question-passage interaction.
Abstract
In this paper, we introduce Adversarial-and-attention Network (A3Net) for Machine Reading Comprehension. This model extends existing approaches from two perspectives. First, adversarial training is applied to several target variables within the model, rather than only to the inputs or embeddings. We control the norm of adversarial perturbations according to the norm of original target variables, so that we can jointly add perturbations to several target variables during training. As an effective regularization method, adversarial training improves robustness and generalization of our model. Second, we propose a multi-layer attention network utilizing three kinds of high-efficiency attention mechanisms. Multi-layer attention conducts interaction between question and passage within each layer, which contributes to reasonable representation and understanding of the model. Combining these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
