BERTQA -- Attention on Steroids
Ankit Chadha, Rewa Sood

TL;DR
This paper enhances BERT for question answering by introducing directed coattention and localized feature extraction, significantly improving F1 scores on the SQUAD2.0 dataset.
Contribution
It proposes a novel coattention mechanism with convolutional features and skip connections, boosting BERT's performance on QA tasks.
Findings
Coattention improves no answer F1 by 4 points in base and 1 point in large models.
Adding localized features further increases dev F1 to 77.03 in base architecture.
Ensembled models achieve a final F1 of 82.317 on the SQUAD 2.0 dataset.
Abstract
In this work, we extend the Bidirectional Encoder Representations from Transformers (BERT) with an emphasis on directed coattention to obtain an improved F1 performance on the SQUAD2.0 dataset. The Transformer architecture on which BERT is based places hierarchical global attention on the concatenation of the context and query. Our additions to the BERT architecture augment this attention with a more focused context to query (C2Q) and query to context (Q2C) attention via a set of modified Transformer encoder units. In addition, we explore adding convolution-based feature extraction within the coattention architecture to add localized information to self-attention. We found that coattention significantly improves the no answer F1 by 4 points in the base and 1 point in the large architecture. After adding skip connections the no answer F1 improved further without causing an additional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification
MethodsTest · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia?
