N-GLARE: An Non-Generative Latent Representation-Efficient LLM Safety Evaluator

Zheyu Lin; Jirui Yang; Yukui Qiu; Hengqi Guo; Yubing Bao; Yao Guan

arXiv:2511.14195·cs.LG·January 9, 2026

N-GLARE: An Non-Generative Latent Representation-Efficient LLM Safety Evaluator

Zheyu Lin, Jirui Yang, Yukui Qiu, Hengqi Guo, Yubing Bao, Yao Guan

PDF

Open Access

TL;DR

N-GLARE introduces a novel, efficient method for evaluating LLM safety by analyzing latent representations instead of generating text, enabling faster and cost-effective safety diagnostics.

Contribution

It proposes a new latent representation-based safety evaluation method, JSS metric, that reduces costs and latency compared to traditional red teaming approaches.

Findings

01

JSS correlates strongly with safety rankings from red teaming.

02

N-GLARE achieves similar discriminative results at less than 1% of token and runtime costs.

03

The method enables real-time safety diagnostics without text generation.

Abstract

Evaluating the safety robustness of LLMs is critical for their deployment. However, mainstream Red Teaming methods rely on online generation and black-box output analysis. These approaches are not only costly but also suffer from feedback latency, making them unsuitable for agile diagnostics after training a new model. To address this, we propose N-GLARE (A Non-Generative, Latent Representation-Efficient LLM Safety Evaluator). N-GLARE operates entirely on the model's latent representations, bypassing the need for full text generation. It characterizes hidden layer dynamics by analyzing the APT (Angular-Probabilistic Trajectory) of latent representations and introducing the JSS (Jensen-Shannon Separability) metric. Experiments on over 40 models and 20 red teaming strategies demonstrate that the JSS metric exhibits high consistency with the safety rankings derived from Red Teaming.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Software Testing and Debugging Techniques