Probing Neural Network Comprehension of Natural Language Arguments

Timothy Niven; Hung-Yu Kao

arXiv:1907.07355·cs.CL·September 17, 2019·34 cites

Probing Neural Network Comprehension of Natural Language Arguments

Timothy Niven, Hung-Yu Kao

PDF

Open Access 1 Repo

TL;DR

This paper reveals that BERT's high performance on an argument comprehension task is due to dataset biases, and introduces an adversarial dataset to better evaluate true understanding.

Contribution

It uncovers dataset biases in argument comprehension benchmarks and creates a new adversarial dataset for more accurate assessment of model understanding.

Findings

01

BERT's performance is largely due to exploiting spurious cues.

02

Models perform at chance on the adversarial dataset.

03

The adversarial dataset offers a more robust evaluation method.

Abstract

We are surprised to find that BERT's peak performance of 77% on the Argument Reasoning Comprehension Task reaches just three points below the average untrained human baseline. However, we show that this result is entirely accounted for by exploitation of spurious statistical cues in the dataset. We analyze the nature of these cues and demonstrate that a range of models all exploit them. This analysis informs the construction of an adversarial dataset on which all models achieve random accuracy. Our adversarial dataset provides a more robust assessment of argument comprehension and should be adopted as the standard in future work.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IKMLab/arct2
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research