NVC-1B: A Large Neural Video Coding Model

Xihua Sheng; Chuanbo Tang; Li Li; Dong Liu; Feng Wu

arXiv:2407.19402·cs.CV·July 30, 2024·1 cites

NVC-1B: A Large Neural Video Coding Model

Xihua Sheng, Chuanbo Tang, Li Li, Dong Liu, Feng Wu

PDF

Open Access

TL;DR

This paper introduces NVC-1B, the first billion-parameter neural video coding model, demonstrating significant performance improvements and setting a new state-of-the-art in video compression efficiency.

Contribution

The paper presents the design and development of NVC-1B, the first large-scale neural video coding model with over 1 billion parameters, and analyzes the impact of model size and architecture on compression performance.

Findings

01

NVC-1B outperforms smaller models in video compression efficiency.

02

Model architecture influences compression performance, with Transformer-based models showing advantages.

03

Scaling model size leads to significant improvements in video coding quality.

Abstract

The emerging large models have achieved notable progress in the fields of natural language processing and computer vision. However, large models for neural video coding are still unexplored. In this paper, we try to explore how to build a large neural video coding model. Based on a small baseline model, we gradually scale up the model sizes of its different coding parts, including the motion encoder-decoder, motion entropy model, contextual encoder-decoder, contextual entropy model, and temporal context mining module, and analyze the influence of model sizes on video compression performance. Then, we explore to use different architectures, including CNN, mixed CNN-Transformer, and Transformer architectures, to implement the neural video coding model and analyze the influence of model architectures on video compression performance. Based on our exploration results, we design the first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Signal Denoising Methods · Advanced Data Compression Techniques · Advanced Image Processing Techniques

MethodsAttention Is All You Need · Label Smoothing · Adam · Linear Layer · Byte Pair Encoding · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer · Dense Connections · Multi-Head Attention