TL;DR
This paper introduces the Multi-Reference Entropy Model (MEM and MEM$^+$) for learned image compression, capturing diverse correlations in latent representations to achieve state-of-the-art rate-distortion performance.
Contribution
The paper proposes MEM and MEM$^+$ models that effectively capture channel-wise, local, and global correlations in latent representations, advancing learned image compression techniques.
Findings
Achieves state-of-the-art BD-rate reduction of 8.05% and 11.39% on Kodak dataset.
Demonstrates improved rate-distortion performance over existing models.
Introduces enhanced checkerboard context capturing techniques.
Abstract
Recently, learned image compression has achieved remarkable performance. The entropy model, which estimates the distribution of the latent representation, plays a crucial role in boosting rate-distortion performance. However, most entropy models only capture correlations in one dimension, while the latent representation contain channel-wise, local spatial, and global spatial correlations. To tackle this issue, we propose the Multi-Reference Entropy Model (MEM) and the advanced version, MEM. These models capture the different types of correlations present in latent representation. Specifically, We first divide the latent representation into slices. When decoding the current slice, we use previously decoded slices as context and employ the attention map of the previously decoded slice to predict global correlations in the current slice. To capture local contexts, we introduce two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
