Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation
Adam Poliak, Aparajita Haldar, Rachel Rudinger, J. Edward Hu, Ellie, Pavlick, Aaron Steven White, Benjamin Van Durme

TL;DR
This paper introduces the DNC, a large, diverse collection of NLI datasets recast from various semantic phenomena, enabling comprehensive evaluation of sentence representations across different reasoning types.
Contribution
The paper presents the DNC, a novel, extensive collection of over half a million NLI pairs from multiple semantic phenomena, facilitating diverse reasoning evaluation.
Findings
DNC covers 7 semantic phenomena with 13 datasets.
Over 500,000 labeled pairs in the collection.
Available online for community use.
Abstract
We present a large-scale collection of diverse natural language inference (NLI) datasets that help provide insight into how well a sentence representation captures distinct types of reasoning. The collection results from recasting 13 existing datasets from 7 semantic phenomena into a common NLI structure, resulting in over half a million labeled context-hypothesis pairs in total. We refer to our collection as the DNC: Diverse Natural Language Inference Collection. The DNC is available online at https://www.decomp.net, and will grow over time as additional resources are recast and added from novel sources.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
