Guarding the Gate: ConceptGuard Battles Concept-Level Backdoors in   Concept Bottleneck Models

Songning Lai; Yu Huang; Jiayu Yang; Gaoxiang Huang; Wenshuo Chen,; Yutao Yue

arXiv:2411.16512·cs.CR·November 26, 2024

Guarding the Gate: ConceptGuard Battles Concept-Level Backdoors in Concept Bottleneck Models

Songning Lai, Yu Huang, Jiayu Yang, Gaoxiang Huang, Wenshuo Chen,, Yutao Yue

PDF

Open Access

TL;DR

This paper introduces ConceptGuard, a novel defense framework that enhances the security of Concept Bottleneck Models against concept-level backdoor attacks, ensuring robustness without sacrificing interpretability.

Contribution

We propose ConceptGuard, the first tailored defense for concept-level backdoors in CBMs, with theoretical guarantees and maintained model performance.

Findings

01

ConceptGuard effectively detects and mitigates concept-level backdoors.

02

The framework provides theoretical robustness guarantees.

03

Experimental results show improved security without loss of interpretability.

Abstract

The increasing complexity of AI models, especially in deep learning, has raised concerns about transparency and accountability, particularly in high-stakes applications like medical diagnostics, where opaque models can undermine trust. Explainable Artificial Intelligence (XAI) aims to address these issues by providing clear, interpretable models. Among XAI techniques, Concept Bottleneck Models (CBMs) enhance transparency by using high-level semantic concepts. However, CBMs are vulnerable to concept-level backdoor attacks, which inject hidden triggers into these concepts, leading to undetectable anomalous behavior. To address this critical security gap, we introduce ConceptGuard, a novel defense framework specifically designed to protect CBMs from concept-level backdoor attacks. ConceptGuard employs a multi-stage approach, including concept clustering based on text distance measurements…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Bayesian Modeling and Causal Inference