Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example
Serena Booth, Yilun Zhou, Ankit Shah, Julie Shah

TL;DR
Bayes-TrEx is a Bayesian sampling framework that identifies in-distribution examples with specific prediction confidences to enhance neural network interpretability, revealing model behaviors beyond standard test set analysis.
Contribution
It introduces a novel, flexible Bayesian sampling method for model inspection that finds in-distribution examples with targeted confidences, enabling comprehensive analysis of neural networks.
Findings
Reveals highly confident misclassifications and ambiguous examples
Visualizes class boundaries and extrapolation behaviors
Exposes neural network overconfidence in various datasets
Abstract
Post-hoc explanation methods are gaining popularity for interpreting, understanding, and debugging neural networks. Most analyses using such methods explain decisions in response to inputs drawn from the test set. However, the test set may have few examples that trigger some model behaviors, such as high-confidence failures or ambiguous classifications. To address these challenges, we introduce a flexible model inspection framework: Bayes-TrEx. Given a data distribution, Bayes-TrEx finds in-distribution examples with a specified prediction confidence. We demonstrate several use cases of Bayes-TrEx, including revealing highly confident (mis)classifications, visualizing class boundaries via ambiguous examples, understanding novel-class extrapolation behavior, and exposing neural network overconfidence. We use Bayes-TrEx to study classifiers trained on CLEVR, MNIST, and Fashion-MNIST, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Advanced Neural Network Applications
MethodsTest
