Identifying Unknown Unknowns in the Open World: Representations and   Policies for Guided Exploration

Himabindu Lakkaraju; Ece Kamar; Rich Caruana; Eric Horvitz

arXiv:1610.09064·cs.AI·December 13, 2016·44 cites

Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration

Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Eric Horvitz

PDF

Open Access

TL;DR

This paper introduces a model-agnostic, two-phase framework for guided exploration to identify unknown unknowns in predictive models caused by training data biases, enhancing model robustness.

Contribution

It presents the first algorithmic approach for discovering unknown unknowns using an explore-exploit strategy guided by oracle feedback.

Findings

01

Effective identification of unknown unknowns across diverse applications

02

Organized data into partitions based on features and confidence scores

03

Guided exploration improves discovery efficiency

Abstract

Predictive models deployed in the real world may assign incorrect labels to instances with high confidence. Such errors or unknown unknowns are rooted in model incompleteness, and typically arise because of the mismatch between training data and the cases encountered at test time. As the models are blind to such errors, input from an oracle is needed to identify these failures. In this paper, we formulate and address the problem of informed discovery of unknown unknowns of any given predictive model where unknown unknowns occur due to systematic biases in the training data. We propose a model-agnostic methodology which uses feedback from an oracle to both identify unknown unknowns and to intelligently guide the discovery. We employ a two-phase approach which first organizes the data into multiple partitions based on the feature similarity of instances and the confidence scores assigned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Data Stream Mining Techniques · Bayesian Modeling and Causal Inference