MSNeRV: Neural Video Representation with Multi-Scale Feature Fusion

Jun Zhu; Xinfeng Zhang; Lv Tang; JunHao Jiang

arXiv:2506.15276·cs.CV·June 19, 2025

MSNeRV: Neural Video Representation with Multi-Scale Feature Fusion

Jun Zhu, Xinfeng Zhang, Lv Tang, JunHao Jiang

PDF

Open Access

TL;DR

MSNeRV introduces a multi-scale feature fusion framework for neural video representation, significantly improving detail retention and compression efficiency over existing INR-based methods and surpassing traditional codecs in dynamic scenarios.

Contribution

The paper presents a novel multi-scale feature fusion approach with a scale-adaptive loss, enhancing INR-based video compression and representation capabilities.

Findings

01

Outperforms existing INR-based methods in detail and efficiency.

02

Surpasses VTM-23.7 in dynamic video compression scenarios.

03

Demonstrates superior representation on HEVC and UVG datasets.

Abstract

Implicit Neural representations (INRs) have emerged as a promising approach for video compression, and have achieved comparable performance to the state-of-the-art codecs such as H.266/VVC. However, existing INR-based methods struggle to effectively represent detail-intensive and fast-changing video content. This limitation mainly stems from the underutilization of internal network features and the absence of video-specific considerations in network design. To address these challenges, we propose a multi-scale feature fusion framework, MSNeRV, for neural video representation. In the encoding stage, we enhance temporal consistency by employing temporal windows, and divide the video into multiple Groups of Pictures (GoPs), where a GoP-level grid is used for background representation. Additionally, we design a multi-scale spatial decoder with a scale-adaptive loss function to integrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition