NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

Junfeng Fang; Nachuan Chen; Houcheng Jiang; Dan Zhang; Fei Shen; Xiang Wang; Xiangnan He; Tat-Seng Chua

arXiv:2603.02219·cs.LG·March 4, 2026

NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

Junfeng Fang, Nachuan Chen, Houcheng Jiang, Dan Zhang, Fei Shen, Xiang Wang, Xiangnan He, Tat-Seng Chua

PDF

Open Access

TL;DR

NExT-Guard offers a training-free, real-time safety safeguard for large language models by leveraging latent features from pre-trained autoencoders, outperforming supervised methods without requiring token-level labels.

Contribution

It introduces a novel, training-free streaming safeguard framework that utilizes pretrained autoencoders to detect unsafe content in real-time without token-level supervision.

Findings

01

Outperforms supervised streaming safeguards in robustness

02

Works effectively across different models and risk scenarios

03

Enables low-cost, scalable deployment of real-time safety measures

Abstract

Large language models are increasingly deployed in streaming scenarios, rendering conventional post-hoc safeguards ineffective as they fail to interdict unsafe content in real-time. While streaming safeguards based on token-level supervised training could address this, they necessitate expensive annotations and suffer from severe overfitting. In this work, we challenge the paradigm that streaming safety must rely on token-level supervised training. Instead, it is an inherent capability of well-trained post-hoc safeguards, as they already encode token-level risk signals in hidden representations. Hence, we introduce NExT-Guard, a training-free framework that achieves streaming safeguards by monitoring interpretable latent features from Sparse Autoencoders (SAEs). It uses pretrained SAEs from publicly available base LLMs, enabling flexible, low-cost deployment without token-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis