Promises and Pitfalls of Black-Box Concept Learning Models

Anita Mahinpei; Justin Clark; Isaac Lage; Finale Doshi-Velez; Weiwei; Pan

arXiv:2106.13314·cs.LG·June 28, 2021·20 cites

Promises and Pitfalls of Black-Box Concept Learning Models

Anita Mahinpei, Justin Clark, Isaac Lage, Finale Doshi-Velez, Weiwei, Pan

PDF

Open Access 2 Repos

TL;DR

This paper investigates the limitations of black-box concept learning models, revealing that they often encode unintended information, which can mislead interpretations despite mitigation efforts.

Contribution

It uncovers the mechanisms of information leakage in concept learning models and proposes strategies to mitigate these issues.

Findings

01

Concept representations encode unintended information

02

Mitigation strategies often fail to prevent information leakage

03

Interpretability of models can be misleading due to hidden information

Abstract

Machine learning models that incorporate concept learning as an intermediate step in their decision making process can match the performance of black-box predictive models while retaining the ability to explain outcomes in human understandable terms. However, we demonstrate that the concept representations learned by these models encode information beyond the pre-defined concepts, and that natural mitigation strategies do not fully work, rendering the interpretation of the downstream prediction misleading. We describe the mechanism underlying the information leakage and suggest recourse for mitigating its effects.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification