TL;DR
This paper introduces Content-Aware Mamba (CAM), a novel state-space model for learned image compression that adaptively captures global redundancy by content-aware token permutation and global priors, achieving state-of-the-art results.
Contribution
It proposes CAM, a content-adaptive SSM that overcomes rigidity of previous models through token permutation and global priors, enhancing compression performance.
Findings
Outperforms VTM-21.0 by 15.91% on Kodak
Achieves 21.34% BD-rate reduction on Tecnick
Surpasses previous methods in rate-distortion metrics
Abstract
Recent learned image compression (LIC) leverages Mamba-style state-space models (SSMs) for global receptive fields with linear complexity. However, the standard Mamba adopts content-agnostic, predefined raster (or multi-directional) scans under strict causality. This rigidity hinders its ability to effectively eliminate redundancy between tokens that are content-correlated but spatially distant. We introduce Content-Aware Mamba (CAM), an SSM that dynamically adapts its processing to the image content. Specifically, CAM overcomes prior limitations with two novel mechanisms. First, it replaces the rigid scan with a content-adaptive token permutation strategy to prioritize interactions between content-similar tokens regardless of their location. Second, it overcomes the sequential dependency by injecting sample-specific global priors into the state-space model, which effectively mitigates…
Peer Reviews
Decision·ICLR 2026 Poster
1. The writing quality is good, and the explanation is clear. 2. The motivation for improving mamba’s fixed scanning order is reasonable, and enhancing adaptability is a valid direction. 3. The paper provides a comprehensive evaluation of complexity, including model size, FLOPs, latency, and memory usage. 4. The ERF visualizations clearly demonstrate the model's global perception ability and content adaptiveness.
5. The paper compares only with MambaIC from CVPR 2025, ignoring LALIC and DCAE. Considering that LALIC and DCAE released code several months ago, a comparison with these methods is essential. 6. The transform module in CMIC appears to be much deeper than in other methods. In Stage 2 and Stage 3, there are four blocks of attention and CAM in total—is this correct? 7. The proposed Learnable Prompt Dictionary is essentially the Attentive State Space Module from mambairv2. The authors do not mentio
- Good RD performance while maintaining low complexity - Relieve the high memory usage problem of prior Mamba based learned image codec
- The methods proposed for both clustering partitioning and global prior modulation are common in prior "content-adaptive" studies, such as Rounting Transformer [1], even in the field of learned image compression, there are similar jobs available [2]. - The main contribution of this paper is to improve the Mamba module in the Mamba-based learned image codec. However, it seems that there are many other differences compared to the previous works, e.g., MambaVC and MambaIC, including the entropy mo
1. The overall writing of the paper is clear, and the figures are well-presented. 2. The proposed method achieves the optimal experimental performance.
1. It is necessary to clarify the specific differences between the proposed method and Mambairv2 [1]. Both content-aware selective scanning and Learnable Prompt have been proposed in Mambairv2. Though I understand that Mambairv2 and the proposed CMIC are applied to two different tasks, the designs in this paper are mostly borrowed from Mambairv2 in a direct manner. There is insufficient technical contribution or further insights beyond Mambairv2 and specifically benefits image compression. 2. T
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
