Advancing the Robustness of Large Language Models through Self-Denoised   Smoothing

Jiabao Ji; Bairu Hou; Zhen Zhang; Guanhua Zhang; Wenqi Fan; Qing Li,; Yang Zhang; Gaowen Liu; Sijia Liu; Shiyu Chang

arXiv:2404.12274·cs.CL·April 19, 2024·3 cites

Advancing the Robustness of Large Language Models through Self-Denoised Smoothing

Jiabao Ji, Bairu Hou, Zhen Zhang, Guanhua Zhang, Wenqi Fan, Qing Li,, Yang Zhang, Gaowen Liu, Sijia Liu, Shiyu Chang

PDF

Open Access 1 Repo

TL;DR

This paper introduces self-denoised smoothing, a novel method that enhances large language models' robustness against adversarial attacks by denoising inputs before prediction, outperforming existing defenses in empirical and certified robustness.

Contribution

It proposes a self-denoised smoothing technique leveraging LLM multitasking to improve robustness without additional model training or fine-tuning.

Findings

01

Outperforms existing methods in empirical robustness against adversarial attacks.

02

Achieves superior certified robustness for downstream tasks and jailbreak defenses.

03

Offers a more efficient and flexible robustness enhancement compared to previous denoising approaches.

Abstract

Although large language models (LLMs) have achieved significant success, their vulnerability to adversarial perturbations, including recent jailbreak attacks, has raised considerable concerns. However, the increasing size of these models and their limited access make improving their robustness a challenging task. Among various defense strategies, randomized smoothing has shown great potential for LLMs, as it does not require full access to the model's parameters or fine-tuning via adversarial training. However, randomized smoothing involves adding noise to the input before model prediction, and the final model's robustness largely depends on the model's performance on these noise corrupted data. Its effectiveness is often limited by the model's sub-optimal performance on noisy data. To address this issue, we propose to leverage the multitasking nature of LLMs to first denoise the noisy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ucsb-nlp-chang/selfdenoise
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling

MethodsRandomized Smoothing · Denoised Smoothing