AttentionMix: Data augmentation method that relies on BERT attention   mechanism

Dominik Lewy; Jacek Ma\'ndziuk

arXiv:2309.11104·cs.CL·September 21, 2023·1 cites

AttentionMix: Data augmentation method that relies on BERT attention mechanism

Dominik Lewy, Jacek Ma\'ndziuk

PDF

Open Access

TL;DR

AttentionMix introduces an attention-based data augmentation technique for NLP that leverages BERT's attention mechanism, outperforming existing Mixup methods on sentiment analysis tasks.

Contribution

The paper presents a novel attention-guided data augmentation method for NLP, extending Mixup ideas to attention mechanisms in models like BERT.

Findings

01

AttentionMix outperforms benchmark Mixup methods on sentiment datasets.

02

Attention-based augmentation improves model performance over vanilla BERT.

03

The approach is applicable to any attention-based model.

Abstract

The Mixup method has proven to be a powerful data augmentation technique in Computer Vision, with many successors that perform image mixing in a guided manner. One of the interesting research directions is transferring the underlying Mixup idea to other domains, e.g. Natural Language Processing (NLP). Even though there already exist several methods that apply Mixup to textual data, there is still room for new, improved approaches. In this work, we introduce AttentionMix, a novel mixing method that relies on attention-based information. While the paper focuses on the BERT attention mechanism, the proposed approach can be applied to generally any attention-based model. AttentionMix is evaluated on 3 standard sentiment classification datasets and in all three cases outperforms two benchmark approaches that utilize Mixup mechanism, as well as the vanilla BERT method. The results confirm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques

MethodsAttention Is All You Need · Residual Connection · Adam · Weight Decay · Dropout · Linear Layer · Layer Normalization · WordPiece · Multi-Head Attention · Linear Warmup With Linear Decay