Fast Proxies for LLM Robustness Evaluation

Tim Beyer; Jan Schuchardt; Leo Schwinn; Stephan G\"unnemann

arXiv:2502.10487·cs.CR·February 18, 2025

Fast Proxies for LLM Robustness Evaluation

Tim Beyer, Jan Schuchardt, Leo Schwinn, Stephan G\"unnemann

PDF

Open Access

TL;DR

This paper introduces fast proxy metrics that accurately predict the robustness of large language models against costly adversarial attacks, significantly reducing computational expenses.

Contribution

It demonstrates that simple proxy metrics can reliably estimate LLM robustness, enabling efficient evaluation without extensive attack runs.

Findings

01

Proxy metrics achieve high correlation with actual attack success rates.

02

Embedding-space and prompting attacks predict robustness effectively.

03

Method reduces computational cost by three orders of magnitude.

Abstract

Evaluating the robustness of LLMs to adversarial attacks is crucial for safe deployment, yet current red-teaming methods are often prohibitively expensive. We compare the ability of fast proxy metrics to predict the real-world robustness of an LLM against a simulated attacker ensemble. This allows us to estimate a model's robustness to computationally expensive attacks without requiring runs of the attacks themselves. Specifically, we consider gradient-descent-based embedding-space attacks, prefilling attacks, and direct prompting. Even though direct prompting in particular does not achieve high ASR, we find that it and embedding-space attacks can predict attack success rates well, achieving $r_{p} = 0.87$ (linear) and $r_{s} = 0.94$ (Spearman rank) correlations with the full attack ensemble while reducing computational cost by three orders of magnitude.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCopper Interconnects and Reliability