Sequential Coordination of Deep Models for Learning Visual Arithmetic
Eric Crawford, Guillaume Rabusseau, Joelle Pineau

TL;DR
This paper introduces a two-tiered neural architecture combining perception modules and a reinforcement learning controller to perform visual arithmetic tasks, effectively integrating perception and reasoning.
Contribution
It presents a novel hierarchical model that coordinates deep perception modules with symbolic reasoning, advancing the integration of perception and reasoning in neural networks.
Findings
Successfully solves visual arithmetic tasks
Improves sample efficiency over standard networks
Demonstrates effective coordination of modules via reinforcement learning
Abstract
Achieving machine intelligence requires a smooth integration of perception and reasoning, yet models developed to date tend to specialize in one or the other; sophisticated manipulation of symbols acquired from rich perceptual spaces has so far proved elusive. Consider a visual arithmetic task, where the goal is to carry out simple arithmetical algorithms on digits presented under natural conditions (e.g. hand-written, placed randomly). We propose a two-tiered architecture for tackling this problem. The lower tier consists of a heterogeneous collection of information processing modules, which can include pre-trained deep neural networks for locating and extracting characters from the image, as well as modules performing symbolic transformations on the representations extracted by perception. The higher tier consists of a controller, trained using reinforcement learning, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Vision and Imaging · Advanced Neural Network Applications
