Mon3tr: Monocular 3D Telepresence with Pre-built Gaussian Avatars as Amortization
Fangyu Lin, Yingdong Hu, Zhening Liu, Yufan Zhuang, Zehong Lin, Jun Zhang

TL;DR
Mon3tr introduces a monocular 3D telepresence system that creates lifelike avatars using Gaussian splatting, enabling real-time, bandwidth-efficient remote interaction on mobile devices.
Contribution
It pioneers integrating 3D Gaussian splatting with monocular input and amortized computation for efficient, high-quality telepresence avatars.
Findings
Achieves > 28 dB PSNR for novel poses
Operates at ~ 80 ms latency
Reduces bandwidth by over 1000x compared to point-cloud streaming
Abstract
Immersive telepresence aims to transform human interaction in AR/VR applications by enabling lifelike full-body holographic representations for enhanced remote collaboration. However, existing systems rely on hardware-intensive multi-camera setups and demand high bandwidth for volumetric streaming, limiting their real-time performance on mobile devices. To overcome these challenges, we propose Mon3tr, a novel Monocular 3D telepresence framework that integrates 3D Gaussian splatting (3DGS) based parametric human modeling into telepresence for the first time. Mon3tr adopts an amortized computation strategy, dividing the process into a one-time offline multi-view reconstruction phase to build a user-specific avatar and a monocular online inference phase during live telepresence sessions. A single monocular RGB camera is used to capture body motions and facial expressions in real time to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Virtual Reality Applications and Impacts · 3D Shape Modeling and Analysis
