Tiered Reasoning for Intuitive Physics: Toward Verifiable Commonsense Language Understanding
Shane Storks, Qiaozi Gao, Yichi Zhang, Joyce Chai

TL;DR
This paper introduces TRIP, a new dataset for evaluating multi-tiered reasoning in language models about intuitive physics, revealing that high end performance does not guarantee valid reasoning support.
Contribution
The paper presents TRIP, a novel dataset with dense annotations for multi-tiered reasoning evaluation, highlighting the gap between performance and reasoning validity in large language models.
Findings
Large LMs achieve high end task performance.
Models struggle to provide valid supporting evidence.
TRIP dataset enables verifiable reasoning evaluation.
Abstract
Large-scale, pre-trained language models (LMs) have achieved human-level performance on a breadth of language understanding tasks. However, evaluations only based on end task performance shed little light on machines' true ability in language understanding and reasoning. In this paper, we highlight the importance of evaluating the underlying reasoning process in addition to end performance. Toward this goal, we introduce Tiered Reasoning for Intuitive Physics (TRIP), a novel commonsense reasoning dataset with dense annotations that enable multi-tiered evaluation of machines' reasoning process. Our empirical results show that while large LMs can achieve high end performance, they struggle to support their predictions with valid supporting evidence. The TRIP dataset and our baseline results will motivate verifiable evaluation of commonsense reasoning and facilitate future research toward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
