ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large   Language Models

Xuxu Liu; Siyuan Liang; Mengya Han; Yong Luo; Aishan Liu; Xiantao Cai,; Zheng He; Dacheng Tao

arXiv:2502.18511·cs.CR·February 27, 2025

ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models

Xuxu Liu, Siyuan Liang, Mengya Han, Yong Luo, Aishan Liu, Xiantao Cai,, Zheng He, Dacheng Tao

PDF

Open Access 1 Video

TL;DR

ELBA-Bench is a comprehensive benchmark framework for evaluating backdoor attacks on large language models, covering various attack methods, datasets, and models, and providing insights into attack effectiveness and robustness.

Contribution

The paper introduces ELBA-Bench, a unified and extensive benchmark for backdoor attacks on LLMs, including over 1300 experiments and a universal toolbox for standardized research.

Findings

01

PEFT attacks outperform non-fine-tuning methods in classification tasks.

02

Optimized triggers improve attack robustness and cross-dataset generalization.

03

Backdoor techniques using task-relevant prompts enhance attack success while maintaining model performance.

Abstract

Generative large language models are crucial in natural language processing, but they are vulnerable to backdoor attacks, where subtle triggers compromise their behavior. Although backdoor attacks against LLMs are constantly emerging, existing benchmarks remain limited in terms of sufficient coverage of attack, metric system integrity, backdoor attack alignment. And existing pre-trained backdoor attacks are idealized in practice due to resource access constraints. Therefore we establish $ELBA-Bench$ , a comprehensive and unified framework that allows attackers to inject backdoor through parameter efficient fine-tuning ( $e.g.,$ LoRA) or without fine-tuning techniques ( $e.g.,$ In-context-learning). $ELBA-Bench$ provides over 1300 experiments encompassing the implementations of 12 attack methods, 18 datasets, and 12 LLMs. Extensive experiments provide new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models· underline

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education