"John is 50 years old, can his son be 65?" Evaluating NLP Models' Understanding of Feasibility
Himanshu Gupta, Neeraj Varshney, Swaroop Mishra, Kuntal Kumar Pal,, Saurabh Arjun Sawant, Kevin Scaria, Siddharth Goyal, Chitta Baral

TL;DR
This paper introduces FeasibilityQA, a dataset to evaluate NLP models' understanding of action feasibility, revealing that even advanced models like GPT-3 struggle significantly with this commonsense reasoning task.
Contribution
The creation of FeasibilityQA dataset and comprehensive evaluation of state-of-the-art models' limitations in understanding action feasibility.
Findings
GPT-3 achieves only 19-62% accuracy on feasibility questions.
Additional knowledge improves performance by 7%.
Models show limited reasoning about action feasibility.
Abstract
In current NLP research, large-scale language models and their abilities are widely being discussed. Some recent works have also found notable failures of these models. Often these failure examples involve complex reasoning abilities. This work focuses on a simple commonsense ability, reasoning about when an action (or its effect) is feasible. To this end, we introduce FeasibilityQA, a question-answering dataset involving binary classification (BCQ) and multi-choice multi-correct questions (MCQ) that test understanding of feasibility. We show that even state-of-the-art models such as GPT-3, GPT-2, and T5 struggle to answer the feasibility questions correctly. Specifically, on MCQ and BCQ questions, GPT-3 achieves an accuracy of just (19%, 62%) and (25%, 64%) in zero-shot and few-shot settings, respectively. We also evaluate models by providing relevant knowledge statements required to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
Methods15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Test · SentencePiece · Discriminative Fine-Tuning · Inverse Square Root Schedule · Gated Linear Unit · Adafactor · T5
