CR-UTP: Certified Robustness against Universal Text Perturbations on Large Language Models
Qian Lou, Xin Liang, Jiaqi Xue, Yancheng Zhang, Rui Xie, Mengxin Zheng

TL;DR
This paper introduces CR-UTP, a method that certifies the robustness of large language models against universal text perturbations by using superior prompt ensembling, achieving state-of-the-art results in maintaining prediction stability.
Contribution
The paper proposes a novel superior prompt search and ensembling approach to enhance certified robustness of language models against universal text perturbations.
Findings
Achieves state-of-the-art certified accuracy against UTPs and ISTPs.
Demonstrates the effectiveness of prompt ensembling in maintaining robustness.
Provides theoretical justification for using ensembles as base prompts.
Abstract
It is imperative to ensure the stability of every prediction made by a language model; that is, a language's prediction should remain consistent despite minor input variations, like word substitutions. In this paper, we investigate the problem of certifying a language model's robustness against Universal Text Perturbations (UTPs), which have been widely used in universal adversarial attacks and backdoor attacks. Existing certified robustness based on random smoothing has shown considerable promise in certifying the input-specific text perturbations (ISTPs), operating under the assumption that any random alteration of a sample's clean or adversarial words would negate the impact of sample-wise perturbations. However, with UTPs, masking only the adversarial words can eliminate the attack. A naive method is to simply increase the masking ratio and the likelihood of masking attack tokens,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsBalanced Selection
