MAT: Mixed-Strategy Game of Adversarial Training in Fine-tuning

Zhehua Zhong; Tianyi Chen; Zhen Wang

arXiv:2306.15826·cs.CL·June 29, 2023·1 cites

MAT: Mixed-Strategy Game of Adversarial Training in Fine-tuning

Zhehua Zhong, Tianyi Chen, Zhen Wang

PDF

Open Access

TL;DR

This paper introduces MAT, a novel mixed-strategy adversarial training method for fine-tuning large language models, which improves robustness and generalization by leveraging game theory and Nash equilibrium concepts.

Contribution

The paper proposes a new adversarial training algorithm based on mixed strategies and Nash equilibrium, enhancing model performance beyond pure-strategy methods.

Findings

01

MAT outperforms state-of-the-art methods on GLUE and ANLI benchmarks.

02

MAT improves model robustness and generalization.

03

Extensive experiments validate the effectiveness of the proposed approach.

Abstract

Fine-tuning large-scale pre-trained language models has been demonstrated effective for various natural language processing (NLP) tasks. Previous studies have established that incorporating adversarial training during the fine-tuning stage can significantly enhance model generalization and robustness. However, from the perspective of game theory, such utilizations of adversarial training correspond to pure-strategy games, which are inherently limited in terms of the scope of their strategies, thereby still having room for improvement. In order to push the performance boundaries, we propose a novel Mixed-strategy Adversarial Training algorithm (MAT). Methodologically, we derive the Nash equilibrium of a mixed-strategy game for adversarial training using Entropy Mirror Descent to establish MAT by sampling method. To verify the effectiveness of MAT, we conducted extensive benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Attention Dropout · WordPiece · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Residual Connection · Softmax