Backdoor4Good: Benchmarking Beneficial Uses of Backdoors in LLMs

Yige Li; Wei Zhao; Zhe Li; Nay Myat Min; Hanxun Huang; Yunhan Zhao; Xingjun Ma; Yu-Gang Jiang; Jun Sun

arXiv:2603.07452·cs.CR·March 10, 2026

Backdoor4Good: Benchmarking Beneficial Uses of Backdoors in LLMs

Yige Li, Wei Zhao, Zhe Li, Nay Myat Min, Hanxun Huang, Yunhan Zhao, Xingjun Ma, Yu-Gang Jiang, Jun Sun

PDF

Open Access

TL;DR

Backdoor4Good introduces a benchmark for using backdoors in large language models to enhance safety and controllability, demonstrating their potential as beneficial tools rather than security threats.

Contribution

The paper formalizes beneficial backdoor learning in LLMs with a triplet framework and provides extensive experiments showing their effectiveness for trustworthy AI applications.

Findings

01

Beneficial backdoors achieve high controllability and tamper-resistance.

02

They maintain performance on clean tasks.

03

Backdoors can be modular and interpretable.

Abstract

Backdoor mechanisms have traditionally been studied as security threats that compromise the integrity of machine learning models. However, the same mechanism -- the conditional activation of specific behaviors through input triggers -- can also serve as a controllable and auditable interface for trustworthy model behavior. In this work, we present \textbf{Backdoor4Good (B4G)}, a unified benchmark and framework for \textit{beneficial backdoor} applications in large language models (LLMs). Unlike conventional backdoor studies focused on attacks and defenses, B4G repurposes backdoor conditioning for Beneficial Tasks that enhance safety, controllability, and accountability. It formalizes beneficial backdoor learning under a triplet formulation $(T, A, U)$ , representing the \emph{Trigger}, \emph{Activation mechanism}, and \emph{Utility function}, and implements a benchmark covering four…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Advanced Malware Detection Techniques