Dual-Latent Collaborative Decoding for Fidelity-Perception Balanced Image Compression
Qi Mao, Zijian Wang, Zhengxue Cheng, Lingyu Zhu, Siwei Ma

TL;DR
This paper introduces MoDE, a dual-latent decoding framework that balances fidelity and perceptual quality in image compression by decomposing reconstruction across complementary latent representations.
Contribution
It proposes a novel dual-latent collaborative decoding method that coordinates scalar-quantized and vector-quantized latents for improved fidelity-perception trade-offs.
Findings
MoDE outperforms existing methods across various bitrates.
The framework effectively balances structural fidelity and perceptual realism.
Decoder-side expert collaboration enhances compression quality.
Abstract
Learned image compression (LIC) increasingly requires reconstructions that balance distortion fidelity and perceptual realism across a wide range of bitrates. However, most existing methods still rely on a single compressed latent representation to simultaneously carry structural details, semantic cues, and perceptual priors, requiring the same latent representation to serve multiple, potentially conflicting roles. This tension becomes evident across different latent paradigms: scalar-quantized (SQ) continuous latents provide rate-scalable fidelity but tend to lose perceptual details at low rates, while vector-quantized (VQ) discrete tokens preserve compact semantic cues but suffer from limited structural fidelity and bitrate scalability. To address this issue, we propose Mixture of Decoder Experts (MoDE), a dual-latent collaborative decoding framework that decomposes reconstruction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
