Robust Residual Finite Scalar Quantization for Neural Compression
Xiaoxu Zhu, Xiaojie Yu, Guangchao Yao, Yiming Ren, Baoxiang Li

TL;DR
This paper introduces RFSQ, a novel residual scalar quantization method with conditioning strategies that improve neural compression performance across audio and image data.
Contribution
RFSQ employs learnable scaling and invertible layer normalization to address residual decay, enhancing multi-stage neural compression.
Findings
RFSQ-LayerNorm improves audio DNSMOS by 3.6% over RVQ.
On ImageNet, RFSQ achieves 0.102 L1 loss, outperforming unconditioned variants.
LayerNorm consistently outperforms other conditioning strategies.
Abstract
Finite Scalar Quantization (FSQ) offers simplified training but suffers from residual magnitude decay in multi-stage settings, where subsequent stages receive exponentially weaker signals. We propose Robust Residual Finite Scalar Quantization (RFSQ), addressing this fundamental limitation through two novel conditioning strategies: learnable scaling factors and invertible layer normalization. Our experiments across audio and image modalities demonstrate RFSQ's effectiveness and generalizability. In audio reconstruction at 24 bits/frame, RFSQ-LayerNorm achieves 3.646 DNSMOS, a 3.6% improvement over state-of-the-art RVQ (3.518). On ImageNet, RFSQ achieves 0.102 L1 loss and 0.100 perceptual loss, with LayerNorm providing 9.7% L1 improvement and 17.4% perceptual improvement over unconditioned variants. The LayerNorm strategy consistently outperforms alternatives by maintaining normalized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
