SkelSplat: Robust Multi-view 3D Human Pose Estimation with Differentiable Gaussian Rendering
Laura Bragagnolo, Leonardo Barcellona, Stefano Ghidoni

TL;DR
SkelSplat introduces a differentiable Gaussian rendering framework for multi-view 3D human pose estimation, enabling robust, ground-truth-free fusion of camera views and improved generalization across diverse scenarios.
Contribution
It proposes a novel skeleton modeling and optimization method using Gaussian splatting, enhancing multi-view pose estimation without relying on 3D ground truth supervision.
Findings
Outperforms non-ground-truth methods on Human3.6M and CMU datasets.
Reduces cross-dataset error by up to 47.8%.
Demonstrates robustness to occlusions without scenario-specific fine-tuning.
Abstract
Accurate 3D human pose estimation is fundamental for applications such as augmented reality and human-robot interaction. State-of-the-art multi-view methods learn to fuse predictions across views by training on large annotated datasets, leading to poor generalization when the test scenario differs. To overcome these limitations, we propose SkelSplat, a novel framework for multi-view 3D human pose estimation based on differentiable Gaussian rendering. Human pose is modeled as a skeleton of 3D Gaussians, one per joint, optimized via differentiable rendering to enable seamless fusion of arbitrary camera views without 3D ground-truth supervision. Since Gaussian Splatting was originally designed for dense scene reconstruction, we propose a novel one-hot encoding scheme that enables independent optimization of human joints. SkelSplat outperforms approaches that do not rely on 3D ground truth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Human Motion and Animation
