Breaking the Adversarial Robustness-Performance Trade-off in Text Classification via Manifold Purification
Chenhao Dang, Jing Ma

TL;DR
This paper introduces MC^2F, a novel method that improves adversarial robustness in text classification without sacrificing performance on clean data by modeling and correcting embeddings on the data manifold.
Contribution
The paper proposes the Manifold-Correcting Causal Flow (MC^2F), a new two-module system that enhances robustness and preserves accuracy by modeling the clean data manifold and correcting adversarial embeddings.
Findings
MC^2F achieves state-of-the-art adversarial robustness.
It preserves and even modestly improves clean data performance.
Extensive evaluations on multiple datasets and attacks confirm its effectiveness.
Abstract
A persistent challenge in text classification (TC) is that enhancing model robustness against adversarial attacks typically degrades performance on clean data. We argue that this challenge can be resolved by modeling the distribution of clean samples in the encoder embedding manifold. To this end, we propose the Manifold-Correcting Causal Flow (MC^2F), a two-module system that operates directly on sentence embeddings. A Stratified Riemannian Continuous Normalizing Flow (SR-CNF) learns the density of the clean data manifold. It identifies out-of-distribution embeddings, which are then corrected by a Geodesic Purification Solver. This solver projects adversarial points back onto the learned manifold via the shortest path, restoring a clean, semantically coherent representation. We conducted extensive evaluations on text classification (TC) across three datasets and multiple adversarial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Misinformation and Its Impacts
