Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration
Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Eric Horvitz

TL;DR
This paper introduces a model-agnostic, two-phase framework for guided exploration to identify unknown unknowns in predictive models caused by training data biases, enhancing model robustness.
Contribution
It presents the first algorithmic approach for discovering unknown unknowns using an explore-exploit strategy guided by oracle feedback.
Findings
Effective identification of unknown unknowns across diverse applications
Organized data into partitions based on features and confidence scores
Guided exploration improves discovery efficiency
Abstract
Predictive models deployed in the real world may assign incorrect labels to instances with high confidence. Such errors or unknown unknowns are rooted in model incompleteness, and typically arise because of the mismatch between training data and the cases encountered at test time. As the models are blind to such errors, input from an oracle is needed to identify these failures. In this paper, we formulate and address the problem of informed discovery of unknown unknowns of any given predictive model where unknown unknowns occur due to systematic biases in the training data. We propose a model-agnostic methodology which uses feedback from an oracle to both identify unknown unknowns and to intelligently guide the discovery. We employ a two-phase approach which first organizes the data into multiple partitions based on the feature similarity of instances and the confidence scores assigned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Data Stream Mining Techniques · Bayesian Modeling and Causal Inference
