NVC-1B: A Large Neural Video Coding Model
Xihua Sheng, Chuanbo Tang, Li Li, Dong Liu, Feng Wu

TL;DR
This paper introduces NVC-1B, the first billion-parameter neural video coding model, demonstrating significant performance improvements and setting a new state-of-the-art in video compression efficiency.
Contribution
The paper presents the design and development of NVC-1B, the first large-scale neural video coding model with over 1 billion parameters, and analyzes the impact of model size and architecture on compression performance.
Findings
NVC-1B outperforms smaller models in video compression efficiency.
Model architecture influences compression performance, with Transformer-based models showing advantages.
Scaling model size leads to significant improvements in video coding quality.
Abstract
The emerging large models have achieved notable progress in the fields of natural language processing and computer vision. However, large models for neural video coding are still unexplored. In this paper, we try to explore how to build a large neural video coding model. Based on a small baseline model, we gradually scale up the model sizes of its different coding parts, including the motion encoder-decoder, motion entropy model, contextual encoder-decoder, contextual entropy model, and temporal context mining module, and analyze the influence of model sizes on video compression performance. Then, we explore to use different architectures, including CNN, mixed CNN-Transformer, and Transformer architectures, to implement the neural video coding model and analyze the influence of model architectures on video compression performance. Based on our exploration results, we design the first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Signal Denoising Methods · Advanced Data Compression Techniques · Advanced Image Processing Techniques
MethodsAttention Is All You Need · Label Smoothing · Adam · Linear Layer · Byte Pair Encoding · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer · Dense Connections · Multi-Head Attention
