Effortless Cross-Platform Video Codec: A Codebook-Based Method
Kuan Tian, Yonghang Guan, Jinxi Xiang, Jun Zhang, Xiao Han, and Wei Yang

TL;DR
This paper introduces a cross-platform video codec that uses codebooks and a cross-attention mechanism, eliminating entropy models and ensuring consistent decoding across different hardware platforms, with competitive compression performance.
Contribution
The proposed framework removes the need for entropy models and optical flow, enabling efficient, cross-platform video compression with consistent decoding and improved performance over traditional codecs.
Findings
Outperforms H.265 (medium) in compression quality.
Eliminates entropy model inconsistencies across platforms.
Achieves high efficiency without optical flow or autoregressive models.
Abstract
Under certain circumstances, advanced neural video codecs can surpass the most complex traditional codecs in their rate-distortion (RD) performance. One of the main reasons for the high performance of existing neural video codecs is the use of the entropy model, which can provide more accurate probability distribution estimations for compressing the latents. This also implies the rigorous requirement that entropy models running on different platforms should use consistent distribution estimations. However, in cross-platform scenarios, entropy models running on different platforms usually yield inconsistent probability distribution estimations due to floating point computation errors that are platform-dependent, which can cause the decoding side to fail in correctly decoding the compressed bitstream sent by the encoding side. In this paper, we propose a cross-platform video compression…
Peer Reviews
Decision·Submitted to ICLR 2024
- The paper is well-written and flows smoothly. - The motivation for the proposed method seems intriguing in the context of neural video compression.
1. **More RD-performance comparison with existing neural video compression methods**: The paper primarily made a comparison with traditional codecs like H.264 and H.265. While Figure 1 demonstrates the artifacts induced by entropy models in cross-platform settings, the paper does not conclusively establish if this issue is prevalent across all neural video compression methods. To strengthen the paper's claims, a more comprehensive RD-performance comparison with a variety of neural methods in sim
1.Novelity of the paper is good. 2. Encourging results.
1.Related work should be updated.
1. This paper focuses on an important and practical issue of neural video compression. And it proposes one reasonable solution. 2. The design of the multi-stage codebook and window-based cross-attention successfully replaces the common motion compensation and autoregressive modules. 3. The performance of its proposed method is acceptable, which is higher than two common traditional codecs H.265 and H.264.
1. The novelty of this paper may be limited as the overall framework bears resemblance to Mentzer et al.'s (2022) approach, which avoids explicit motion estimation by employing a transform-like architecture. Additionally, the use of vector quantization for compression is not new and may have been inspired by Zhu's work in image compression. Furthermore, the windows-based cross attention appears similar to SwinTransformer. Consequently, the major architecture and designs lack sufficient novelty.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Image and Signal Denoising Methods · Advanced Vision and Imaging
MethodsConcatenated Skip Connection · Softmax
