GSVC: Efficient Video Representation and Compression Through 2D Gaussian Splatting
Longan Wang, Yuang Shi, Wei Tsang Ooi

TL;DR
GSVC introduces a novel method for representing and compressing videos using 2D Gaussian splats, leveraging temporal redundancy and adaptive techniques to achieve competitive quality and speed.
Contribution
The paper presents GSVC, a new approach that uses 2D Gaussian splats for efficient video compression, incorporating predictive, pruning, and dynamic addition strategies.
Findings
Achieves rate-distortion performance comparable to AV1 and VVC.
Attains rendering speeds of 1500 fps for 1080p videos.
Effectively captures scene dynamics and motion.
Abstract
3D Gaussian splats have emerged as a revolutionary, effective, learned representation for static 3D scenes. In this work, we explore using 2D Gaussian splats as a new primitive for representing videos. We propose GSVC, an approach to learning a set of 2D Gaussian splats that can effectively represent and compress video frames. GSVC incorporates the following techniques: (i) To exploit temporal redundancy among adjacent frames, which can speed up training and improve the compression efficiency, we predict the Gaussian splats of a frame based on its previous frame; (ii) To control the trade-offs between file size and quality, we remove Gaussian splats with low contribution to the video quality; (iii) To capture dynamics in videos, we randomly add Gaussian splats to fit content with large motion or newly-appeared objects; (iv) To handle significant changes in the scene, we detect key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Video Coding and Compression Technologies · Image and Signal Denoising Methods
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Sparse Evolutionary Training
