Efficient Neural Video Representation with Temporally Coherent   Modulation

Seungjun Shin; Suji Kim; Dokwan Oh

arXiv:2505.00335·cs.CV·May 2, 2025

Efficient Neural Video Representation with Temporally Coherent Modulation

Seungjun Shin, Suji Kim, Dokwan Oh

PDF

TL;DR

This paper introduces NVTM, a novel neural video representation method that captures dynamic video features efficiently, achieving faster encoding speeds and better quality compared to existing grid-type approaches, with applications in compression and enhancement tasks.

Contribution

NVTM is a new framework that decomposes video data into 2D grids with flow, enabling rapid, parameter-efficient encoding of dynamic videos, outperforming prior grid-based methods.

Findings

01

Over 3x faster encoding speed than NeRV-style methods.

02

Improves PSNR/LPIPS by 1.54dB/0.019 on UVG and 1.84dB/0.013 on MCL-JCV.

03

Achieves comparable performance to H.264 and HEVC in compression.

Abstract

Implicit neural representations (INR) has found successful applications across diverse domains. To employ INR in real-life, it is important to speed up training. In the field of INR for video applications, the state-of-the-art approach employs grid-type parametric encoding and successfully achieves a faster encoding speed in comparison to its predecessors. However, the grid usage, which does not consider the video's dynamic nature, leads to redundant use of trainable parameters. As a result, it has significantly lower parameter efficiency and higher bitrate compared to NeRV-style methods that do not use a parametric encoding. To address the problem, we propose Neural Video representation with Temporally coherent Modulation (NVTM), a novel framework that can capture dynamic characteristics of video. By decomposing the spatio-temporal 3D video data into a set of 2D grids with flow…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.