A Compact Hybrid Convolution--Frequency State Space Network for Learned Image Compression
Haodong Pan, Hao Wei, Yusong Wang, Nanning Zheng, Caigui Jiang

TL;DR
This paper introduces HCFSSNet, a hybrid image compression model combining convolutional layers with a frequency state space network to improve long-range dependency modeling while preserving 2D neighborhood relations.
Contribution
It proposes a novel hybrid architecture with a VFSS block and frequency-aware modules, advancing learned image compression performance.
Findings
Achieves competitive rate-distortion performance on benchmark datasets.
Effectively models long-range dependencies with neighborhood preservation.
Outperforms recent LIC codecs in experiments.
Abstract
Learned image compression (LIC) has recently benefited from Transformer- and state space models (SSM)- based backbones for modeling long-range dependencies. However, the former typically incurs quadratic complexity, whereas the latter often disrupts neighborhood continuity by flattening 2D features into 1D sequences. To address these issues, we propose a compact Hybrid Convolution and Frequency State Space Network (HCFSSNet) for LIC. HCFSSNet combines convolutional layers for local detail modeling with a Vision Frequency State Space (VFSS) block for complementary long-range contextual aggregation. Specifically, the VFSS block consists of a Vision Omni-directional Neighborhood State Space (VONSS) module, which scans features along horizontal, vertical, and diagonal directions to better preserve 2D neighborhood relations, and an Adaptive Frequency Modulation Module (AFMM), which performs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
