CluCERT: Certifying LLM Robustness via Clustering-Guided Denoising Smoothing

Zixia Wang; Gaojie Jin; Jia Hu; Ronghui Mu

arXiv:2512.08967·cs.LG·December 11, 2025

CluCERT: Certifying LLM Robustness via Clustering-Guided Denoising Smoothing

Zixia Wang, Gaojie Jin, Jia Hu, Ronghui Mu

PDF

Open Access 1 Video

TL;DR

CluCERT introduces a clustering-guided denoising framework that improves the certification of LLM robustness against adversarial prompts by providing tighter bounds and reducing computational costs.

Contribution

The paper presents a novel clustering-based denoising approach that enhances robustness certification of LLMs with theoretical validation and efficiency improvements.

Findings

01

Outperforms existing methods in robustness bounds

02

Achieves higher computational efficiency

03

Effective in various downstream and jailbreak scenarios

Abstract

Recent advancements in Large Language Models (LLMs) have led to their widespread adoption in daily applications. Despite their impressive capabilities, they remain vulnerable to adversarial attacks, as even minor meaning-preserving changes such as synonym substitutions can lead to incorrect predictions. As a result, certifying the robustness of LLMs against such adversarial prompts is of vital importance. Existing approaches focused on word deletion or simple denoising strategies to achieve robustness certification. However, these methods face two critical limitations: (1) they yield loose robustness bounds due to the lack of semantic validation for perturbed outputs and (2) they suffer from high computational costs due to repeated sampling. To address these limitations, we propose CluCERT, a novel framework for certifying LLM robustness via clustering-guided denoising smoothing.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CluCERT: Certifying LLM Robustness via Clustering-Guided Denoising Smoothing· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Hate Speech and Cyberbullying Detection