Multistage Spatial Context Models for Learned Image Compression
Fangzheng Lin, Heming Sun, Jinming Liu, Jiro Katto

TL;DR
This paper introduces multistage spatial context models for learned image compression that enable fast decoding comparable to checkerboard models while achieving RD performance on par with autoregressive models, through patch-based decoding and order optimization.
Contribution
The paper proposes a novel multistage spatial context model that balances decoding speed and RD performance by patch-based decoding and optimized decoding order.
Findings
Achieves decoding speed similar to checkerboard models.
Matches or exceeds autoregressive RD performance.
Introduces a decoding order optimization algorithm.
Abstract
Recent state-of-the-art Learned Image Compression methods feature spatial context models, achieving great rate-distortion improvements over hyperprior methods. However, the autoregressive context model requires serial decoding, limiting runtime performance. The Checkerboard context model allows parallel decoding at a cost of reduced RD performance. We present a series of multistage spatial context models allowing both fast decoding and better RD performance. We split the latent space into square patches and decode serially within each patch while different patches are decoded in parallel. The proposed method features a comparable decoding speed to Checkerboard while reaching the RD performance of Autoregressive and even also outperforming Autoregressive. Inside each patch, the decoding order must be carefully decided as a bad order negatively impacts performance; therefore, we also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Algorithms and Data Compression · Advanced Image and Video Retrieval Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
