Watermarking Counterfactual Explanations

Hangzhi Guo; Firdaus Ahmed Choudhury; Tinghua Chen; Amulya Yadav

arXiv:2405.18671·cs.LG·October 22, 2024

Watermarking Counterfactual Explanations

Hangzhi Guo, Firdaus Ahmed Choudhury, Tinghua Chen, Amulya Yadav

PDF

Open Access 1 Repo

TL;DR

This paper introduces CFMark, a watermarking framework that embeds detectable watermarks into counterfactual explanations to identify unauthorized model extraction attacks without compromising explanation quality.

Contribution

We propose a novel bi-level optimization-based watermarking method for CF explanations that enables secure detection of model theft attacks while maintaining explanation fidelity.

Findings

01

CFMark achieves an F-1 score of ~0.89 in attack detection.

02

Watermarking causes only ~1.3% degradation in explanation validity.

03

The framework is effective across diverse datasets and attack techniques.

Abstract

Counterfactual (CF) explanations for ML model predictions provide actionable recourse recommendations to individuals adversely impacted by predicted outcomes. However, despite being preferred by end-users, CF explanations have been shown to pose significant security risks in real-world applications; in particular, malicious adversaries can exploit CF explanations to perform query-efficient model extraction attacks on the underlying proprietary ML model. To address this security challenge, we propose CFMark, a novel model-agnostic watermarking framework for detecting unauthorized model extraction attacks relying on CF explanations. CFMark involves a novel bi-level optimization problem to embed an indistinguishable watermark into the generated CF explanation such that any future model extraction attacks using these watermarked CF explanations can be detected using a null hypothesis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BirkhoffG/CFMark
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques · Digital and Cyber Forensics · Digital Media Forensic Detection

MethodsSparse Evolutionary Training