Learning Abstract Visual Reasoning via Task Decomposition: A Case Study in Raven Progressive Matrices
Jakub Kwiatkowski, Krzysztof Krawiec

TL;DR
This paper introduces a transformer-based deep learning model that decomposes abstract visual reasoning tasks in Raven Progressive Matrices into subgoals, predicting object properties to improve reasoning and reduce bias.
Contribution
The novel approach predicts visual object properties and arrangements to solve RPMs, outperforming existing methods and offering interpretability and bias mitigation.
Findings
Model outperforms state-of-the-art methods.
Provides insights into inference process.
Reduces known benchmark biases.
Abstract
Learning to perform abstract reasoning often requires decomposing the task in question into intermediate subgoals that are not specified upfront, but need to be autonomously devised by the learner. In Raven Progressive Matrices (RPM), the task is to choose one of the available answers given a context, where both the context and answers are composite images featuring multiple objects in various spatial arrangements. As this high-level goal is the only guidance available, learning to solve RPMs is challenging. In this study, we propose a deep learning architecture based on the transformer blueprint which, rather than directly making the above choice, addresses the subgoal of predicting the visual properties of individual objects and their arrangements. The multidimensional predictions obtained in this way are then directly juxtaposed to choose the answer. We consider a few ways in which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
