Fixing confirmation bias in feature attribution methods via semantic   match

Giovanni Cin\`a; Daniel Fernandez-Llaneza; Ludovico Deponte; Nishant; Mishra; Tabea E. R\"ober; Sandro Pezzelle; Iacer Calixto; Rob Goedhart,; \c{S}. \.Ilker Birbil

arXiv:2307.00897·cs.LG·February 27, 2024·1 cites

Fixing confirmation bias in feature attribution methods via semantic match

Giovanni Cin\`a, Daniel Fernandez-Llaneza, Ludovico Deponte, Nishant, Mishra, Tabea E. R\"ober, Sandro Pezzelle, Iacer Calixto, Rob Goedhart,, \c{S}. \.Ilker Birbil

PDF

Open Access

TL;DR

This paper introduces a structured 'semantic match' approach to improve the reliability of feature attribution methods in AI, helping users better interpret model explanations in terms of human concepts and reduce confirmation bias.

Contribution

It proposes a novel framework for evaluating semantic match between human concepts and explanations, addressing a key flaw in existing feature attribution methods.

Findings

01

Semantic match assessment reveals both desirable and undesirable model behaviors.

02

The approach improves interpretability by aligning explanations with human concepts.

03

Experimental validation across tabular and image data demonstrates effectiveness.

Abstract

Feature attribution methods have become a staple method to disentangle the complex behavior of black box models. Despite their success, some scholars have argued that such methods suffer from a serious flaw: they do not allow a reliable interpretation in terms of human concepts. Simply put, visualizing an array of feature contributions is not enough for humans to conclude something about a model's internal representations, and confirmation bias can trick users into false beliefs about model behavior. We argue that a structured approach is required to test whether our hypotheses on the model are confirmed by the feature attributions. This is what we call the "semantic match" between human concepts and (sub-symbolic) explanations. Building on the conceptual framework put forward in Cin\`a et al. [2023], we propose a structured approach to evaluate semantic match in practice. We showcase…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Advanced Graph Neural Networks