EnsembleSHAP: Faithful and Certifiably Robust Attribution for Random Subspace Method

Yanting Wang; Jinyuan Jia

arXiv:2603.30034·cs.CR·April 1, 2026

EnsembleSHAP: Faithful and Certifiably Robust Attribution for Random Subspace Method

Yanting Wang, Jinyuan Jia

PDF

1 Repo 1 Video

TL;DR

EnsembleSHAP introduces a computationally efficient, robust, and secure feature attribution method specifically designed for the random subspace method, providing provable protection against various explanation-preserving attacks.

Contribution

It is the first to establish a provably robust and faithful feature attribution method tailored for the random subspace approach, reusing its computational byproducts.

Findings

01

EnsembleSHAP is computationally efficient and maintains local accuracy.

02

It offers guaranteed protection against privacy-preserving attacks.

03

Evaluations show effectiveness against backdoor, adversarial, and jailbreak attacks.

Abstract

Random subspace method has wide security applications such as providing certified defenses against adversarial and backdoor attacks, and building robustly aligned LLM against jailbreaking attacks. However, the explanation of random subspace method lacks sufficient exploration. Existing state-of-the-art feature attribution methods, such as Shapley value and LIME, are computationally impractical and lacks security guarantee when applied to random subspace method. In this work, we propose EnsembleSHAP, an intrinsically faithful and secure feature attribution for random subspace method that reuses its computational byproducts. Specifically, our feature attribution method is 1) computationally efficient, 2) maintains essential properties of effective feature attribution (such as local accuracy), and 3) offers guaranteed protection against privacy-preserving attacks on feature attribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Wang-Yanting/EnsembleSHAP
github

Videos

EnsembleSHAP: Faithful and Certifiably Robust Attribution for Random Subspace Method· slideslive