Advancing the Robustness of Large Language Models through Self-Denoised Smoothing
Jiabao Ji, Bairu Hou, Zhen Zhang, Guanhua Zhang, Wenqi Fan, Qing Li,, Yang Zhang, Gaowen Liu, Sijia Liu, Shiyu Chang

TL;DR
This paper introduces self-denoised smoothing, a novel method that enhances large language models' robustness against adversarial attacks by denoising inputs before prediction, outperforming existing defenses in empirical and certified robustness.
Contribution
It proposes a self-denoised smoothing technique leveraging LLM multitasking to improve robustness without additional model training or fine-tuning.
Findings
Outperforms existing methods in empirical robustness against adversarial attacks.
Achieves superior certified robustness for downstream tasks and jailbreak defenses.
Offers a more efficient and flexible robustness enhancement compared to previous denoising approaches.
Abstract
Although large language models (LLMs) have achieved significant success, their vulnerability to adversarial perturbations, including recent jailbreak attacks, has raised considerable concerns. However, the increasing size of these models and their limited access make improving their robustness a challenging task. Among various defense strategies, randomized smoothing has shown great potential for LLMs, as it does not require full access to the model's parameters or fine-tuning via adversarial training. However, randomized smoothing involves adding noise to the input before model prediction, and the final model's robustness largely depends on the model's performance on these noise corrupted data. Its effectiveness is often limited by the model's sub-optimal performance on noisy data. To address this issue, we propose to leverage the multitasking nature of LLMs to first denoise the noisy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
MethodsRandomized Smoothing · Denoised Smoothing
