Learning Visual Abstract Reasoning through Dual-Stream Networks

Kai Zhao; Chang Xu; Bailu Si

arXiv:2411.19451·cs.CV·December 2, 2024

Learning Visual Abstract Reasoning through Dual-Stream Networks

Kai Zhao, Chang Xu, Bailu Si

PDF

1 Repo

TL;DR

This paper introduces DRNet, a dual-stream neural network inspired by the two-stream hypothesis, which effectively tackles visual abstract reasoning tasks like Raven's Progressive Matrices by capturing diverse image features and reasoning over abstract rules.

Contribution

The paper proposes a novel dual-stream neural network architecture that enhances visual abstract reasoning by integrating parallel feature extraction with rule-based reasoning, achieving state-of-the-art results.

Findings

01

DRNet achieves top performance on RPM benchmarks.

02

It demonstrates strong generalization to out-of-distribution scenarios.

03

The dual-stream approach effectively captures local and spatial features.

Abstract

Visual abstract reasoning tasks present challenges for deep neural networks, exposing limitations in their capabilities. In this work, we present a neural network model that addresses the challenges posed by Raven's Progressive Matrices (RPM). Inspired by the two-stream hypothesis of visual processing, we introduce the Dual-stream Reasoning Network (DRNet), which utilizes two parallel branches to capture image features. On top of the two streams, a reasoning module first learns to merge the high-level features of the same image. Then, it employs a rule extractor to handle combinations involving the eight context images and each candidate image, extracting discrete abstract rules and utilizing an multilayer perceptron (MLP) to make predictions. Empirical results demonstrate that the proposed DRNet achieves state-of-the-art average performance across multiple RPM benchmarks. Furthermore,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VecchioID/DRNet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.