TAVAT: Token-Aware Virtual Adversarial Training for Language Understanding
Linyang Li, Xipeng Qiu

TL;DR
TAVAT introduces a token-aware virtual adversarial training method that generates fine-grained perturbations in NLP, significantly improving model robustness and performance on benchmarks like GLUE.
Contribution
The paper proposes a novel token-aware perturbation method with a token-level normalization ball, enhancing virtual adversarial training for NLP tasks.
Findings
Improves BERT's GLUE score from 78.3 to 80.9
Enhances sequence labeling and text classification performance
Demonstrates effectiveness across various NLP tasks
Abstract
Gradient-based adversarial training is widely used in improving the robustness of neural networks, while it cannot be easily adapted to natural language processing tasks since the embedding space is discrete. In natural language processing fields, virtual adversarial training is introduced since texts are discrete and cannot be perturbed by gradients directly. Alternatively, virtual adversarial training, which generates perturbations on the embedding space, is introduced in NLP tasks. Despite its success, existing virtual adversarial training methods generate perturbations roughly constrained by Frobenius normalization balls. To craft fine-grained perturbations, we propose a Token-Aware Virtual Adversarial Training method. We introduce a token-level accumulated perturbation vocabulary to initialize the perturbations better and use a token-level normalization ball to constrain these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Natural Language Processing Techniques
MethodsLinear Layer · Adam · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Dropout · Linear Warmup With Linear Decay · Layer Normalization · Attention Dropout · WordPiece
