Identifying and Benchmarking Natural Out-of-Context Prediction Problems

David Madras; Richard Zemel

arXiv:2110.13223·cs.LG·October 27, 2021

Identifying and Benchmarking Natural Out-of-Context Prediction Problems

David Madras, Richard Zemel

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a unified framework for measuring out-of-context prediction performance in deep learning, leveraging auxiliary information to identify challenging examples and analyzing how different benchmarks influence evaluation outcomes.

Contribution

It presents NOOCh, a suite of natural challenge sets, and demonstrates how context variations can reveal specific out-of-context failure modes in models.

Findings

01

Rich auxiliary info helps identify OOC examples.

02

Benchmark design choices impact evaluation conclusions.

03

Varying context notions probe different failure modes.

Abstract

Deep learning systems frequently fail at out-of-context (OOC) prediction, the problem of making reliable predictions on uncommon or unusual inputs or subgroups of the training distribution. To this end, a number of benchmarks for measuring OOC performance have recently been introduced. In this work, we introduce a framework unifying the literature on OOC performance measurement, and demonstrate how rich auxiliary information can be leveraged to identify candidate sets of OOC examples in existing datasets. We present NOOCh: a suite of naturally-occurring "challenge sets", and show how varying notions of context can be used to probe specific OOC failure modes. Experimentally, we explore the tradeoffs between various learning approaches on these challenge sets and demonstrate how the choices made in designing OOC benchmarks can yield varying conclusions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dmadras/nooch
pytorchOfficial

Videos

Identifying and Benchmarking Natural Out-of-Context Prediction Problems· slideslive

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Data Stream Mining Techniques · Machine Learning and Data Classification