Backdoor defense, learnability and obfuscation

Paul Christiano; Jacob Hilton; Victor Lecomte; Mark Xu

arXiv:2409.03077·cs.LG·February 12, 2025

Backdoor defense, learnability and obfuscation

Paul Christiano, Jacob Hilton, Victor Lecomte, Mark Xu

PDF

TL;DR

This paper formalizes the concept of defendability against backdoors in functions, linking it to learnability and obfuscation, and explores its computational complexity and relationship with VC dimension.

Contribution

It introduces a formal notion of defendability, connects it to learnability and VC dimension, and analyzes its computational aspects and limitations.

Findings

01

Defendability is closely related to VC dimension in unbounded settings.

02

Efficient PAC learnability implies efficient defendability, but not vice versa.

03

Polynomial size circuits are not efficiently defendable, while decision trees are easier to defend than to learn.

Abstract

We introduce a formal notion of defendability against backdoors using a game between an attacker and a defender. In this game, the attacker modifies a function to behave differently on a particular input known as the "trigger", while behaving the same almost everywhere else. The defender then attempts to detect the trigger at evaluation time. If the defender succeeds with high enough probability, then the function class is said to be defendable. The key constraint on the attacker that makes defense possible is that the attacker's strategy must work for a randomly-chosen trigger. Our definition is simple and does not explicitly mention learning, yet we demonstrate that it is closely connected to learnability. In the computationally unbounded setting, we use a voting algorithm of Hanneke et al. (2022) to show that defendability is essentially determined by the VC dimension of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.