Boosting Neural Representations for Videos with a Conditional Decoder
Xinjie Zhang, Ren Yang, Dailan He, Xingtong Ge, Tongda Xu, Yan Wang,, Hongwei Qin, Jun Zhang

TL;DR
This paper presents a universal boosting framework for implicit neural video representations, improving their reconstruction quality, convergence speed, and codec performance through a conditional decoder and novel feature generation methods.
Contribution
It introduces a conditional decoder with a temporal-aware affine transform and sinusoidal NeRV-like blocks to enhance implicit video representations, a novel approach not previously explored.
Findings
Boosts baseline INRs' reconstruction quality and convergence speed.
Achieves superior inpainting and interpolation results.
Outperforms baseline INRs and rivals traditional codecs in rate-distortion performance.
Abstract
Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing, showing remarkable versatility across various video tasks. However, existing methods often fail to fully leverage their representation capabilities, primarily due to inadequate alignment of intermediate features during target frame decoding. This paper introduces a universal boosting framework for current implicit video representation approaches. Specifically, we utilize a conditional decoder with a temporal-aware affine transform module, which uses the frame index as a prior condition to effectively align intermediate features with target frames. Besides, we introduce a sinusoidal NeRV-like block to generate diverse intermediate features and achieve a more balanced parameter distribution, thereby enhancing the model's capacity. With a high-frequency information-preserving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · ALIGN · Inpainting
