Guarding the Gate: ConceptGuard Battles Concept-Level Backdoors in Concept Bottleneck Models
Songning Lai, Yu Huang, Jiayu Yang, Gaoxiang Huang, Wenshuo Chen,, Yutao Yue

TL;DR
This paper introduces ConceptGuard, a novel defense framework that enhances the security of Concept Bottleneck Models against concept-level backdoor attacks, ensuring robustness without sacrificing interpretability.
Contribution
We propose ConceptGuard, the first tailored defense for concept-level backdoors in CBMs, with theoretical guarantees and maintained model performance.
Findings
ConceptGuard effectively detects and mitigates concept-level backdoors.
The framework provides theoretical robustness guarantees.
Experimental results show improved security without loss of interpretability.
Abstract
The increasing complexity of AI models, especially in deep learning, has raised concerns about transparency and accountability, particularly in high-stakes applications like medical diagnostics, where opaque models can undermine trust. Explainable Artificial Intelligence (XAI) aims to address these issues by providing clear, interpretable models. Among XAI techniques, Concept Bottleneck Models (CBMs) enhance transparency by using high-level semantic concepts. However, CBMs are vulnerable to concept-level backdoor attacks, which inject hidden triggers into these concepts, leading to undetectable anomalous behavior. To address this critical security gap, we introduce ConceptGuard, a novel defense framework specifically designed to protect CBMs from concept-level backdoor attacks. ConceptGuard employs a multi-stage approach, including concept clustering based on text distance measurements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Bayesian Modeling and Causal Inference
