SpaceNLI: Evaluating the Consistency of Predicting Inferences in Space
Lasha Abzianidze, Joost Zwarts, Yoad Winter

TL;DR
SpaceNLI is a newly created dataset designed to evaluate natural language inference systems' ability to handle diverse spatial reasoning tasks, revealing current models' limitations and challenges with specific spatial inferences.
Contribution
This paper introduces SpaceNLI, the first dataset focusing on spatial reasoning in NLI, along with a new evaluation metric called Pattern Accuracy.
Findings
Current NLI systems perform moderately on SpaceNLI.
Models struggle with non-projective spatial inferences, especially 'between' prepositions.
Pattern-specific analysis reveals gaps in spatial reasoning capabilities.
Abstract
While many natural language inference (NLI) datasets target certain semantic phenomena, e.g., negation, tense & aspect, monotonicity, and presupposition, to the best of our knowledge, there is no NLI dataset that involves diverse types of spatial expressions and reasoning. We fill this gap by semi-automatically creating an NLI dataset for spatial reasoning, called SpaceNLI. The data samples are automatically generated from a curated set of reasoning patterns, where the patterns are annotated with inference labels by experts. We test several SOTA NLI systems on SpaceNLI to gauge the complexity of the dataset and the system's capacity for spatial reasoning. Moreover, we introduce a Pattern Accuracy and argue that it is a more reliable and stricter measure than the accuracy for evaluating a system's performance on pattern-based generated data samples. Based on the evaluation results we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Constraint Satisfaction and Optimization
