TL;DR
This paper introduces a hypothesis-only baseline for NLI tasks, revealing that models ignoring context can still perform surprisingly well due to dataset biases, challenging assumptions about inference requirements.
Contribution
It demonstrates that hypothesis-only models can outperform majority baselines in NLI, highlighting dataset biases and the need for better evaluation methods.
Findings
Hypothesis-only models outperform majority class baselines in multiple NLI datasets.
Statistical irregularities in datasets enable inference without context.
Analysis suggests biases may inflate perceived model performance.
Abstract
We propose a hypothesis only baseline for diagnosing Natural Language Inference (NLI). Especially when an NLI dataset assumes inference is occurring based purely on the relationship between a context and a hypothesis, it follows that assessing entailment relations while ignoring the provided context is a degenerate solution. Yet, through experiments on ten distinct NLI datasets, we find that this approach, which we refer to as a hypothesis-only model, is able to significantly outperform a majority class baseline across a number of NLI datasets. Our analysis suggests that statistical irregularities may allow a model to perform NLI in some datasets beyond what should be achievable without access to the context.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
