Fast Encoding and Decoding for Implicit Video Representation
Hao Chen, Saining Xie, Ser-Nam Lim, Abhinav Shrivastava

TL;DR
This paper presents a novel approach to significantly accelerate the encoding and decoding processes of implicit video representations using transformer-based and parallel decoding techniques, enabling faster video processing.
Contribution
The authors introduce NeRV-Enc and NeRV-Dec, achieving 10,000x faster encoding and 11x faster decoding, surpassing traditional codecs and reducing size.
Findings
NeRV-Enc speeds up encoding by 10,000 times.
NeRV-Dec achieves 11 times faster decoding than conventional codecs.
Decoding is more efficient and smaller in size compared to pre-decoded videos.
Abstract
Despite the abundant availability and content richness for video data, its high-dimensionality poses challenges for video research. Recent advancements have explored the implicit representation for videos using neural networks, demonstrating strong performance in applications such as video compression and enhancement. However, the prolonged encoding time remains a persistent challenge for video Implicit Neural Representations (INRs). In this paper, we focus on improving the speed of video encoding and decoding within implicit representations. We introduce two key components: NeRV-Enc, a transformer-based hyper-network for fast encoding; and NeRV-Dec, a parallel decoder for efficient video loading. NeRV-Enc achieves an impressive speed-up of by eliminating gradient-based optimization. Meanwhile, NeRV-Dec simplifies video decoding, outperforming conventional codecs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Coding and Compression Technologies · Advanced Data Compression Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus
