Efficient Neural Video Representation with Temporally Coherent Modulation
Seungjun Shin, Suji Kim, Dokwan Oh

TL;DR
This paper introduces NVTM, a novel neural video representation method that captures dynamic video features efficiently, achieving faster encoding speeds and better quality compared to existing grid-type approaches, with applications in compression and enhancement tasks.
Contribution
NVTM is a new framework that decomposes video data into 2D grids with flow, enabling rapid, parameter-efficient encoding of dynamic videos, outperforming prior grid-based methods.
Findings
Over 3x faster encoding speed than NeRV-style methods.
Improves PSNR/LPIPS by 1.54dB/0.019 on UVG and 1.84dB/0.013 on MCL-JCV.
Achieves comparable performance to H.264 and HEVC in compression.
Abstract
Implicit neural representations (INR) has found successful applications across diverse domains. To employ INR in real-life, it is important to speed up training. In the field of INR for video applications, the state-of-the-art approach employs grid-type parametric encoding and successfully achieves a faster encoding speed in comparison to its predecessors. However, the grid usage, which does not consider the video's dynamic nature, leads to redundant use of trainable parameters. As a result, it has significantly lower parameter efficiency and higher bitrate compared to NeRV-style methods that do not use a parametric encoding. To address the problem, we propose Neural Video representation with Temporally coherent Modulation (NVTM), a novel framework that can capture dynamic characteristics of video. By decomposing the spatio-temporal 3D video data into a set of 2D grids with flow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
