Can Explanations Be Useful for Calibrating Black Box Models?
Xi Ye, Greg Durrett

TL;DR
This paper explores how explanations of black box NLP models can be used to calibrate and improve their performance on new domains, especially when data is limited, by employing a simple calibration classifier.
Contribution
It introduces a method that leverages model explanations and human intuition to calibrate black box models for domain adaptation in NLP tasks.
Findings
Explanations improve model calibration across domain shifts.
Calibration boosts accuracy when not all predictions are required.
Calibration transferability varies between tasks.
Abstract
NLP practitioners often want to take existing trained models and apply them to data from new domains. While fine-tuning or few-shot learning can be used to adapt a base model, there is no single recipe for making these techniques work; moreover, one may not have access to the original model weights if it is deployed as a black box. We study how to improve a black box model's performance on a new domain by leveraging explanations of the model's behavior. Our approach first extracts a set of features combining human intuition about the task with model attributions generated by black box interpretation techniques, then uses a simple calibrator, in the form of a classifier, to predict whether the base model was correct or not. We experiment with our method on two tasks, extractive question answering and natural language inference, covering adaptation from several pairs of domains with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Bayesian Modeling and Causal Inference
