MambaVC: Learned Visual Compression with Selective State Spaces

Shiyu Qin; Jinpeng Wang; Yimin Zhou; Bin Chen; Tianci Luo; Baoyi An,; Tao Dai; Shutao Xia; Yaowei Wang

arXiv:2405.15413·eess.IV·May 29, 2024·3 cites

MambaVC: Learned Visual Compression with Selective State Spaces

Shiyu Qin, Jinpeng Wang, Yimin Zhou, Bin Chen, Tianci Luo, Baoyi An,, Tao Dai, Shutao Xia, Yaowei Wang

PDF

Open Access 1 Repo

TL;DR

MambaVC introduces a novel visual compression network based on state-space models that achieves superior rate-distortion performance with lower computational and memory costs, especially effective on high-resolution images.

Contribution

This paper pioneers the use of state-space models in learned visual compression, developing a new VSS block with 2D selective scanning for improved global context modeling.

Findings

01

Outperforms CNN and Transformer-based methods on Kodak dataset.

02

Reduces computational costs by 42% and 24%.

03

Saves 12% and 71% of memory compared to other models.

Abstract

Learned visual compression is an important and active task in multimedia. Existing approaches have explored various CNN- and Transformer-based designs to model content distribution and eliminate redundancy, where balancing efficacy (i.e., rate-distortion trade-off) and efficiency remains a challenge. Recently, state-space models (SSMs) have shown promise due to their long-range modeling capacity and efficiency. Inspired by this, we take the first step to explore SSMs for visual compression. We introduce MambaVC, a simple, strong and efficient compression network based on SSM. MambaVC develops a visual state space (VSS) block with a 2D selective scanning (2DSS) module as the nonlinear activation function after each downsampling, which helps to capture informative global contexts and enhances compression. On compression benchmark datasets, MambaVC achieves superior rate-distortion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qinsy123/2024-mambavc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Advanced Data Compression Techniques

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections