ContiGuard: A Framework for Continual Toxicity Detection Against Evolving Evasive Perturbations

Hankun Kang; Xin Miao; Jianhao Chen; Jintao Wen; Mayi Xu; Weiyu Zhang; Wenpeng Lu; Tieyun Qian

arXiv:2603.14843·cs.CL·March 17, 2026

ContiGuard: A Framework for Continual Toxicity Detection Against Evolving Evasive Perturbations

Hankun Kang, Xin Miao, Jianhao Chen, Jintao Wen, Mayi Xu, Weiyu Zhang, Wenpeng Lu, Tieyun Qian

PDF

Open Access

TL;DR

ContiGuard introduces a continual learning framework for toxicity detection that adapts to evolving evasive perturbations by enhancing semantic understanding and discriminative feature learning, maintaining detection robustness over time.

Contribution

This work is the first to propose a continual learning framework specifically designed for toxicity detection against evolving perturbations, incorporating semantic enrichment and discriminability strategies.

Findings

01

Improves detection robustness against evolving evasive tactics.

02

Enhances semantic comprehension of perturbed toxic content.

03

Maintains high detection accuracy over time despite perturbations.

Abstract

Toxicity detection mitigates the dissemination of toxic content (e.g., hateful comments, posts, and messages within online social actions) to safeguard a healthy online social environment. However, malicious users persistently develop evasive perturbations to disguise toxic content and evade detectors. Traditional detectors or methods are static over time and are inadequate in addressing these evolving evasion tactics. Thus, continual learning emerges as a logical approach to dynamically update detection ability against evolving perturbations. Nevertheless, disparities across perturbations hinder the detector's continual learning on perturbed text. More importantly, perturbation-induced noises distort semantics to degrade comprehension and also impair critical feature learning to render detection sensitive to perturbations. These amplify the challenge of continual learning against…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Topic Modeling