Breaking the Adversarial Robustness-Performance Trade-off in Text Classification via Manifold Purification

Chenhao Dang; Jing Ma

arXiv:2511.07888·cs.CL·February 2, 2026

Breaking the Adversarial Robustness-Performance Trade-off in Text Classification via Manifold Purification

Chenhao Dang, Jing Ma

PDF

Open Access 1 Video

TL;DR

This paper introduces MC^2F, a novel method that improves adversarial robustness in text classification without sacrificing performance on clean data by modeling and correcting embeddings on the data manifold.

Contribution

The paper proposes the Manifold-Correcting Causal Flow (MC^2F), a new two-module system that enhances robustness and preserves accuracy by modeling the clean data manifold and correcting adversarial embeddings.

Findings

01

MC^2F achieves state-of-the-art adversarial robustness.

02

It preserves and even modestly improves clean data performance.

03

Extensive evaluations on multiple datasets and attacks confirm its effectiveness.

Abstract

A persistent challenge in text classification (TC) is that enhancing model robustness against adversarial attacks typically degrades performance on clean data. We argue that this challenge can be resolved by modeling the distribution of clean samples in the encoder embedding manifold. To this end, we propose the Manifold-Correcting Causal Flow (MC^2F), a two-module system that operates directly on sentence embeddings. A Stratified Riemannian Continuous Normalizing Flow (SR-CNF) learns the density of the clean data manifold. It identifies out-of-distribution embeddings, which are then corrected by a Geodesic Purification Solver. This solver projects adversarial points back onto the learned manifold via the shortest path, restoring a clean, semantically coherent representation. We conducted extensive evaluations on text classification (TC) across three datasets and multiple adversarial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Breaking the Adversarial Robustness-Performance Trade-off in Text Classification via Manifold Purification· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Misinformation and Its Impacts