On the Transferability of Minimal Prediction Preserving Inputs in Question Answering
Shayne Longpre, Yi Lu, Christopher DuBois

TL;DR
This paper investigates the transferability and invariance of minimal prediction preserving inputs (MPPIs) in question answering models, revealing their robustness across domains and challenging their interpretability as indicators of model generalization.
Contribution
The study uncovers the surprising invariance and transferability of MPPIs across models and domains, questioning their use as interpretability tools for model generalization.
Findings
MPPIs are invariant to training seed, architecture, pretraining, and domain.
MPPIs transfer across domains with high performance.
Penalizing over-confidence on MPPIs does not improve robustness.
Abstract
Recent work (Feng et al., 2018) establishes the presence of short, uninterpretable input fragments that yield high confidence and accuracy in neural models. We refer to these as Minimal Prediction Preserving Inputs (MPPIs). In the context of question answering, we investigate competing hypotheses for the existence of MPPIs, including poor posterior calibration of neural models, lack of pretraining, and "dataset bias" (where a model learns to attend to spurious, non-generalizable cues in the training data). We discover a perplexing invariance of MPPIs to random training seed, model architecture, pretraining, and training domain. MPPIs demonstrate remarkable transferability across domains achieving significantly higher performance than comparably short queries. Additionally, penalizing over-confidence on MPPIs fails to improve either generalization or adversarial robustness. These results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInterpretability
