RUMPL: Ray-Based Transformers for Universal Multi-View 2D to 3D Human Pose Lifting
Seyed Abolfazl Ghasemzadeh, Alexandre Alahi, Christophe De Vleeschouwer

TL;DR
RUMPL introduces a transformer-based 3D pose lifting method using a 3D ray-based representation that is camera calibration independent, enabling universal multi-view human pose estimation without retraining.
Contribution
The paper proposes RUMPL, a novel transformer framework with a ray-based 2D keypoint representation that generalizes across arbitrary multi-view setups without retraining.
Findings
Reduces MPJPE by up to 53% compared to triangulation.
Achieves over 60% improvement over transformer-based baselines.
Demonstrates robustness on in-the-wild multi-view and multi-person datasets.
Abstract
Estimating 3D human poses from 2D images remains challenging due to occlusions and projective ambiguity. Multi-view learning-based approaches mitigate these issues but often fail to generalize to real-world scenarios, as large-scale multi-view datasets with 3D ground truth are scarce and captured under constrained conditions. To overcome this limitation, recent methods rely on 2D pose estimation combined with 2D-to-3D pose lifting trained on synthetic data. Building on our previous MPL framework, we propose RUMPL, a transformer-based 3D pose lifter that introduces a 3D ray-based representation of 2D keypoints. This formulation makes the model independent of camera calibration and the number of views, enabling universal deployment across arbitrary multi-view configurations without retraining or fine-tuning. A new View Fusion Transformer leverages learned fused-ray tokens to aggregate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Advanced Vision and Imaging
