Testing the Generalization Power of Neural Network Models Across NLI   Benchmarks

Aarne Talman; Stergios Chatzikyriakidis

arXiv:1810.09774·cs.CL·June 4, 2019

Testing the Generalization Power of Neural Network Models Across NLI Benchmarks

Aarne Talman, Stergios Chatzikyriakidis

PDF

TL;DR

This paper investigates the limited generalization of neural network models across different natural language inference benchmarks, revealing that models often fail to transfer well between datasets despite similar inference tasks.

Contribution

The study systematically evaluates neural network models across multiple NLI datasets, highlighting their poor cross-dataset generalization and the limitations of current datasets.

Findings

01

Models trained on one dataset perform poorly on others.

02

Large pre-trained models improve transfer when datasets are similar.

03

Current NLI datasets lack coverage of inference nuances.

Abstract

Neural network models have been very successful in natural language inference, with the best models reaching 90% accuracy in some benchmarks. However, the success of these models turns out to be largely benchmark specific. We show that models trained on a natural language inference dataset drawn from one benchmark fail to perform well in others, even if the notion of inference assumed in these benchmarks is the same or similar. We train six high performing neural network models on different datasets and show that each one of these has problems of generalizing when we replace the original test set with a test set taken from another corpus designed for the same task. In light of these results, we argue that most of the current neural network models are not able to generalize well in the task of natural language inference. We find that using large pre-trained language models helps with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.