Collecting Diverse Natural Language Inference Problems for Sentence   Representation Evaluation

Adam Poliak; Aparajita Haldar; Rachel Rudinger; J. Edward Hu; Ellie; Pavlick; Aaron Steven White; Benjamin Van Durme

arXiv:1804.08207·cs.CL·August 30, 2018

Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation

Adam Poliak, Aparajita Haldar, Rachel Rudinger, J. Edward Hu, Ellie, Pavlick, Aaron Steven White, Benjamin Van Durme

PDF

Open Access

TL;DR

This paper introduces the DNC, a large, diverse collection of NLI datasets recast from various semantic phenomena, enabling comprehensive evaluation of sentence representations across different reasoning types.

Contribution

The paper presents the DNC, a novel, extensive collection of over half a million NLI pairs from multiple semantic phenomena, facilitating diverse reasoning evaluation.

Findings

01

DNC covers 7 semantic phenomena with 13 datasets.

02

Over 500,000 labeled pairs in the collection.

03

Available online for community use.

Abstract

We present a large-scale collection of diverse natural language inference (NLI) datasets that help provide insight into how well a sentence representation captures distinct types of reasoning. The collection results from recasting 13 existing datasets from 7 semantic phenomena into a common NLI structure, resulting in over half a million labeled context-hypothesis pairs in total. We refer to our collection as the DNC: Diverse Natural Language Inference Collection. The DNC is available online at https://www.decomp.net, and will grow over time as additional resources are recast and added from novel sources.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification