AttentionMix: Data augmentation method that relies on BERT attention mechanism
Dominik Lewy, Jacek Ma\'ndziuk

TL;DR
AttentionMix introduces an attention-based data augmentation technique for NLP that leverages BERT's attention mechanism, outperforming existing Mixup methods on sentiment analysis tasks.
Contribution
The paper presents a novel attention-guided data augmentation method for NLP, extending Mixup ideas to attention mechanisms in models like BERT.
Findings
AttentionMix outperforms benchmark Mixup methods on sentiment datasets.
Attention-based augmentation improves model performance over vanilla BERT.
The approach is applicable to any attention-based model.
Abstract
The Mixup method has proven to be a powerful data augmentation technique in Computer Vision, with many successors that perform image mixing in a guided manner. One of the interesting research directions is transferring the underlying Mixup idea to other domains, e.g. Natural Language Processing (NLP). Even though there already exist several methods that apply Mixup to textual data, there is still room for new, improved approaches. In this work, we introduce AttentionMix, a novel mixing method that relies on attention-based information. While the paper focuses on the BERT attention mechanism, the proposed approach can be applied to generally any attention-based model. AttentionMix is evaluated on 3 standard sentiment classification datasets and in all three cases outperforms two benchmark approaches that utilize Mixup mechanism, as well as the vanilla BERT method. The results confirm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsAttention Is All You Need · Residual Connection · Adam · Weight Decay · Dropout · Linear Layer · Layer Normalization · WordPiece · Multi-Head Attention · Linear Warmup With Linear Decay
