What do we expect from Multiple-choice QA Systems?

Krunal Shah; Nitish Gupta; Dan Roth

arXiv:2011.10647·cs.CL·November 24, 2020

What do we expect from Multiple-choice QA Systems?

Krunal Shah, Nitish Gupta, Dan Roth

PDF

Open Access

TL;DR

This paper evaluates top MCQA models against human-like expectations using input perturbations, revealing shortcomings and proposing a new training method to improve model attention and alignment with expectations.

Contribution

It introduces a novel evaluation approach for MCQA models based on perturbations and proposes a modified training paradigm to enhance model attention and expectation alignment.

Findings

01

Original models fall short of expectations under perturbations.

02

Modified training improves model attention without sacrificing performance.

03

Models trained with the new paradigm better satisfy human-like expectations.

Abstract

The recent success of machine learning systems on various QA datasets could be interpreted as a significant improvement in models' language understanding abilities. However, using various perturbations, multiple recent works have shown that good performance on a dataset might not indicate performance that correlates well with human's expectations from models that "understand" language. In this work we consider a top performing model on several Multiple Choice Question Answering (MCQA) datasets, and evaluate it against a set of expectations one might have from such a model, using a series of zero-information perturbations of the model's inputs. Our results show that the model clearly falls short of our expectations, and motivates a modified training approach that forces the model to better attend to the inputs. We show that the new training paradigm leads to a model that performs on par…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications