Certified Robustness for Large Language Models with Self-Denoising

Zhen Zhang; Guanhua Zhang; Bairu Hou; Wenqi Fan; Qing Li; Sijia Liu,; Yang Zhang; Shiyu Chang

arXiv:2307.07171·cs.CL·July 17, 2023·5 cites

Certified Robustness for Large Language Models with Self-Denoising

Zhen Zhang, Guanhua Zhang, Bairu Hou, Wenqi Fan, Qing Li, Sijia Liu,, Yang Zhang, Shiyu Chang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-denoising method leveraging large language models to improve certified robustness against noisy inputs, outperforming existing certification techniques in both theoretical and empirical robustness.

Contribution

It proposes a novel self-denoising approach that enhances robustness certification of LLMs without needing separate models, improving efficiency and flexibility.

Findings

01

Outperforms existing certification methods in robustness.

02

Achieves larger certification radii.

03

Demonstrates improved empirical robustness.

Abstract

Although large language models (LLMs) have achieved great success in vast real-world applications, their vulnerabilities towards noisy inputs have significantly limited their uses, especially in high-stake environments. In these contexts, it is crucial to ensure that every prediction made by large language models is stable, i.e., LLM predictions should be consistent given minor differences in the input. This largely falls into the study of certified robust LLMs, i.e., all predictions of LLM are certified to be correct in a local region around the input. Randomized smoothing has demonstrated great potential in certifying the robustness and prediction stability of LLMs. However, randomized smoothing requires adding noise to the input before model prediction, and its certification performance depends largely on the model's performance on corrupted data. As a result, its direct application…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ucsb-nlp-chang/selfdenoise
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsRandomized Smoothing