Explaining Black-box Model Predictions via Two-level Nested Feature Attributions with Consistency Property

Yuya Yoshikawa; Masanari Kimura; Ryotaro Shimizu; Yuki Saito

arXiv:2405.14522·cs.LG·May 26, 2025

Explaining Black-box Model Predictions via Two-level Nested Feature Attributions with Consistency Property

Yuya Yoshikawa, Masanari Kimura, Ryotaro Shimizu, Yuki Saito

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel explanation method for black-box models that simultaneously estimates high- and low-level feature attributions by leveraging their nested structure and a new consistency property, improving explanation faithfulness and efficiency.

Contribution

It proposes a model-agnostic local explanation technique that exploits nested feature structures and introduces a consistency property to enhance attribution accuracy and coherence.

Findings

01

Accurately estimates high- and low-level feature attributions

02

Produces faithful and consistent explanations with fewer model queries

03

Effective in image and text classification tasks

Abstract

Techniques that explain the predictions of black-box machine learning models are crucial to make the models transparent, thereby increasing trust in AI systems. The input features to the models often have a nested structure that consists of high- and low-level features, and each high-level feature is decomposed into multiple low-level features. For such inputs, both high-level feature attributions (HiFAs) and low-level feature attributions (LoFAs) are important for better understanding the model's decision. In this paper, we propose a model-agnostic local explanation method that effectively exploits the nested structure of the input to estimate the two-level feature attributions simultaneously. A key idea of the proposed method is to introduce the consistency property that should exist between the HiFAs and LoFAs, thereby bridging the separate optimization problems for estimating them.…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 3

Strengths

- The paper proposes to estimate HiFAs and LoFAs simultaneously, while previous works have only estimated them separately. - The consistency property proposed by the authors is a reasonable property for HiFAs and LoFAs. - The paper does experiments that ablate different important properties on both image and text datasets. - The proposed method performed better than all baselines in all the metrics, and can get better attributions with a smaller number of perturbations. - The method can be used

Weaknesses

- The high-level features are constrained to predefined image/sentence and cannot be dynamically chosen by the method. - With the bottom-up baselines, we can actually convert any feature attribution method to the BU version of it. The paper only compare with different versions of LIME and MILLI. It would be more convincing that faithfulness and consistency cannot be achieved together with more baselines such as (BU-)SHAP, (BU-)RISE[1], IntGrad[2]. - Insertion and deletion are only proxy metrics

Reviewer 02Rating 3Confidence 4

Strengths

- Overall, the paper is well-written and well-organized, allowing readers to follow the narrative easily. - The HiFAs and LoFAs estimated by the proposed method in this paper are consistent.

Weaknesses

- Lines 297-305: The settings of TD-LIME and TD-MILLI are unclear. Additional explanation is needed to help readers understand their significance. - This paper lacks a formal introduction to the evaluation metrics NDCG (line 327) and HIML (line 339). - Figure 3 lacks an introduction to "IA," and the font size should be enlarged. - The experiments in this paper are conducted solely on synthetic datasets and lack results on real datasets, such as medical imaging data. - In Figure 5, BU-LIME demons

Reviewer 03Rating 3Confidence 3

Strengths

**Originality:** - The idea of jointly estimating high-level and low-level feature attributions with consistency constraints is novel and well-motivated. **Quality:** - Experiments are comprehensive, evaluating on both CV and NLP tasks. - Quantitative metrics assess different aspects like correctness, faithfulness, and consistency of the attributions. **Clarity:** - The paper is generally well-written with clear visualization. **Significance:** - The method enables more coherent explanation

Weaknesses

- One foundamental assumption of this paper is that “ the input features have a nested structure that consists of high- and low-level features, and each high-level feature is decomposed into multiple low-level features. ” - Another important assumption is that there is consistency between low and high level features. However, what if conflicting attributions between high and low levels? For instance, when a high-level feature is deemed important, but none of its constituent low-level features

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Machine Learning in Healthcare