CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning
Adam Dahlgren Lindstr\"om, Savitha Sam Abraham

TL;DR
CLEVR-Math is a new multi-modal dataset combining textual, visual, and mathematical reasoning for simple math word problems, highlighting current model limitations in handling chained operations.
Contribution
The paper introduces CLEVR-Math, a novel dataset for multi-modal math reasoning, and evaluates neural and neuro-symbolic models, revealing their inability to generalize to complex operation chains.
Findings
Models struggle with chained operations.
Neural and neuro-symbolic methods do not generalize well.
Current models have limitations in multi-modal reasoning.
Abstract
We introduce CLEVR-Math, a multi-modal math word problems dataset consisting of simple math word problems involving addition/subtraction, represented partly by a textual description and partly by an image illustrating the scenario. The text describes actions performed on the scene that is depicted in the image. Since the question posed may not be about the scene in the image, but about the state of the scene before or after the actions are applied, the solver envision or imagine the state changes due to these actions. Solving these word problems requires a combination of language, visual and mathematical reasoning. We apply state-of-the-art neural and neuro-symbolic models for visual question answering on CLEVR-Math and empirically evaluate their performances. Our results show how neither method generalise to chains of operations. We discuss the limitations of the two in addressing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Video Analysis and Summarization
