VisionGRU: A Linear-Complexity RNN Model for Efficient Image Analysis

Shicheng Yin; Kaixuan Yin; Weixing Chen; Enbo Huang; Yang Liu

arXiv:2412.18178·cs.CV·December 30, 2024·2 cites

VisionGRU: A Linear-Complexity RNN Model for Efficient Image Analysis

Shicheng Yin, Kaixuan Yin, Weixing Chen, Enbo Huang, Yang Liu

PDF

Open Access 1 Repo

TL;DR

VisionGRU introduces a linear-complexity RNN architecture for efficient high-resolution image analysis, outperforming ViTs in accuracy and resource usage, and enabling scalable vision tasks.

Contribution

It proposes a novel RNN-based model, VisionGRU, with a simplified minGRU and hierarchical modules for efficient multi-scale image feature extraction.

Findings

01

Outperforms ViTs on ImageNet and ADE20K datasets.

02

Reduces memory and computational costs significantly.

03

Effective for high-resolution image classification and segmentation.

Abstract

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) are two dominant models for image analysis. While CNNs excel at extracting multi-scale features and ViTs effectively capture global dependencies, both suffer from high computational costs, particularly when processing high-resolution images. Recently, state-space models (SSMs) and recurrent neural networks (RNNs) have attracted attention due to their efficiency. However, their performance in image classification tasks remains limited. To address these challenges, this paper introduces VisionGRU, a novel RNN-based architecture designed for efficient image classification. VisionGRU leverages a simplified Gated Recurrent Unit (minGRU) to process large-scale image features with linear complexity. It divides images into smaller patches and progressively reduces the sequence length while increasing the channel depth, thus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yangliu9208/visiongru
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · Advanced Vision and Imaging · CCD and CMOS Imaging Sensors

MethodsSoftmax · Attention Is All You Need