Fast Encoding and Decoding for Implicit Video Representation

Hao Chen; Saining Xie; Ser-Nam Lim; Abhinav Shrivastava

arXiv:2409.19429·cs.CV·October 16, 2024

Fast Encoding and Decoding for Implicit Video Representation

Hao Chen, Saining Xie, Ser-Nam Lim, Abhinav Shrivastava

PDF

Open Access

TL;DR

This paper presents a novel approach to significantly accelerate the encoding and decoding processes of implicit video representations using transformer-based and parallel decoding techniques, enabling faster video processing.

Contribution

The authors introduce NeRV-Enc and NeRV-Dec, achieving 10,000x faster encoding and 11x faster decoding, surpassing traditional codecs and reducing size.

Findings

01

NeRV-Enc speeds up encoding by 10,000 times.

02

NeRV-Dec achieves 11 times faster decoding than conventional codecs.

03

Decoding is more efficient and smaller in size compared to pre-decoded videos.

Abstract

Despite the abundant availability and content richness for video data, its high-dimensionality poses challenges for video research. Recent advancements have explored the implicit representation for videos using neural networks, demonstrating strong performance in applications such as video compression and enhancement. However, the prolonged encoding time remains a persistent challenge for video Implicit Neural Representations (INRs). In this paper, we focus on improving the speed of video encoding and decoding within implicit representations. We introduce two key components: NeRV-Enc, a transformer-based hyper-network for fast encoding; and NeRV-Dec, a parallel decoder for efficient video loading. NeRV-Enc achieves an impressive speed-up of $1 0^{4} \times$ by eliminating gradient-based optimization. Meanwhile, NeRV-Dec simplifies video decoding, outperforming conventional codecs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Video Coding and Compression Technologies · Advanced Data Compression Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus