Generalizable 3D Gaussian Splatting enabled Semantic Coding for Real-Time Immersive Video Communications
Dingxi Yang, Wenqi Guo, Yue Liu, Jungong Han, Zhijin Qin

TL;DR
This paper introduces GS-SCNet, a unified framework combining 3D Gaussian Splatting reconstruction with semantic coding for real-time immersive video, improving compression and rendering quality.
Contribution
It presents the first end-to-end system integrating generalizable 3DGS with semantic coding, enhancing efficiency and robustness in real-time 3D video communication.
Findings
Achieves better rate-distortion trade-offs in synthetic and real datasets.
Demonstrates strong cross-domain generalization and robustness.
Outperforms traditional decoupled paradigms in efficiency and quality.
Abstract
Real-time immersive video communications, particularly high-fidelity 3D telepresence, necessitates a synergistic balance between instantaneous dynamic scene reconstruction and high-efficiency data transmission. While recent advancements in feed-forward 3D Gaussian Splatting (3DGS) have enabled real-time rendering, performing multi-view video coding and 3D reconstruction in a decoupled manner leads to suboptimal compression efficiency and high computational complexity. To address this, we propose GS-SCNet, the first unified end-to-end framework that seamlessly integrates generalizable 3DGS reconstruction with a dedicated deep Semantic Coding pipeline. Our architecture is underpinned by two core technical contributions: (i) we introduce a Disparity-Guided Parallel Semantic Codec that exploits epipolar geometric priors to facilitate cross-view contextual interaction via disparity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
