Disentangled Text Representation Learning with Information-Theoretic Perspective for Adversarial Robustness
Jiahao Zhao, Wenji Mao

TL;DR
This paper proposes a novel disentangled representation learning approach based on information theory to improve adversarial robustness in NLP models by explicitly separating robust and non-robust features.
Contribution
It introduces a mutual information-based disentangled learning framework that explicitly separates robust and non-robust features for enhanced adversarial robustness in NLP.
Findings
Significantly outperforms existing methods under adversarial attacks.
Effectively disentangles robust and non-robust features.
Improves model reliability in NLP tasks.
Abstract
Adversarial vulnerability remains a major obstacle to constructing reliable NLP systems. When imperceptible perturbations are added to raw input text, the performance of a deep learning model may drop dramatically under attacks. Recent work argues the adversarial vulnerability of the model is caused by the non-robust features in supervised training. Thus in this paper, we tackle the adversarial robustness challenge from the view of disentangled representation learning, which is able to explicitly disentangle robust and non-robust features in text. Specifically, inspired by the variation of information (VI) in information theory, we derive a disentangled learning objective composed of mutual information to represent both the semantic representativeness of latent embeddings and differentiation of robust and non-robust features. On the basis of this, we design a disentangled learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
