VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision
Xihua Sheng, Li Li, Dong Liu, Houqiang Li

TL;DR
The paper introduces VNVC, a neural video coding framework that efficiently supports both video reconstruction and direct analysis tasks by learning compact representations, reducing complexity and improving versatility for human and machine vision applications.
Contribution
VNVC is the first neural video coding framework enabling direct task execution on compressed features, combining high compression efficiency with versatile application support.
Findings
Achieves high compression efficiency for video reconstruction.
Provides satisfactory performance for video analysis tasks.
Reduces computational complexity compared to traditional methods.
Abstract
Almost all digital videos are coded into compact representations before being transmitted. Such compact representations need to be decoded back to pixels before being displayed to humans and - as usual - before being enhanced/analyzed by machine vision algorithms. Intuitively, it is more efficient to enhance/analyze the coded representations directly without decoding them into pixels. Therefore, we propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis, thereby being versatile for both human and machine vision. Our VNVC framework has a feature-based compression loop. In the loop, one frame is encoded into compact representations and decoded to an intermediate feature that is obtained before performing reconstruction. The intermediate feature can be used as reference in motion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis
