GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh
Jing Wen, Xiaoming Zhao, Zhongzheng Ren, Alexander G. Schwing,, Shenlong Wang

TL;DR
GoMAvatar is a real-time, memory-efficient method for creating animatable 3D human avatars from monocular videos, combining Gaussian splatting with mesh deformation for high-quality rendering and pose re-articulation.
Contribution
It introduces the Gaussians-on-Mesh hybrid model, enabling high-quality, real-time human avatar creation from a single video with improved efficiency and compatibility with graphics pipelines.
Findings
Achieves 43 FPS rendering speed.
Memory usage is only 3.63 MB per subject.
Outperforms existing methods in quality and efficiency.
Abstract
We introduce GoMAvatar, a novel approach for real-time, memory-efficient, high-quality animatable human modeling. GoMAvatar takes as input a single monocular video to create a digital avatar capable of re-articulation in new poses and real-time rendering from novel viewpoints, while seamlessly integrating with rasterization-based graphics pipelines. Central to our method is the Gaussians-on-Mesh representation, a hybrid 3D model combining rendering quality and speed of Gaussian splatting with geometry modeling and compatibility of deformable meshes. We assess GoMAvatar on ZJU-MoCap data and various YouTube videos. GoMAvatar matches or surpasses current monocular human modeling algorithms in rendering quality and significantly outperforms them in computational efficiency (43 FPS) while being memory-efficient (3.63 MB per subject).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · 3D Shape Modeling and Analysis · Human Motion and Animation
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
