Misleading Failures of Partial-input Baselines

Shi Feng; Eric Wallace; Jordan Boyd-Graber

arXiv:1905.05778·cs.LG·June 19, 2019·6 cites

Misleading Failures of Partial-input Baselines

Shi Feng, Eric Wallace, Jordan Boyd-Graber

PDF

Open Access

TL;DR

This paper investigates the limitations of partial-input baselines in dataset evaluation, showing they can be misleading and may not detect all artifacts, with implications for dataset verification and creation.

Contribution

It demonstrates that partial-input baselines can fail to identify certain dataset artifacts, challenging their reliability for dataset difficulty assessment.

Findings

01

Partial-input baselines can be high even when datasets contain artifacts.

02

Artificial datasets with hidden trivial patterns evade partial-input detection.

03

A hypothesis-only model can solve 15% of SNLI examples using trivial patterns.

Abstract

Recent work establishes dataset difficulty and removes annotation artifacts via partial-input baselines (e.g., hypothesis-only models for SNLI or question-only models for VQA). When a partial-input baseline gets high accuracy, a dataset is cheatable. However, the converse is not necessarily true: the failure of a partial-input baseline does not mean a dataset is free of artifacts. To illustrate this, we first design artificial datasets which contain trivial patterns in the full input that are undetectable by any partial-input model. Next, we identify such artifacts in the SNLI dataset - a hypothesis-only model augmented with trivial patterns in the premise can solve 15% of the examples that are previously considered "hard". Our work provides a caveat for the use of partial-input baselines for dataset verification and creation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning