RUMPL: Ray-Based Transformers for Universal Multi-View 2D to 3D Human Pose Lifting

Seyed Abolfazl Ghasemzadeh; Alexandre Alahi; Christophe De Vleeschouwer

arXiv:2512.15488·cs.CV·December 18, 2025

RUMPL: Ray-Based Transformers for Universal Multi-View 2D to 3D Human Pose Lifting

Seyed Abolfazl Ghasemzadeh, Alexandre Alahi, Christophe De Vleeschouwer

PDF

Open Access

TL;DR

RUMPL introduces a transformer-based 3D pose lifting method using a 3D ray-based representation that is camera calibration independent, enabling universal multi-view human pose estimation without retraining.

Contribution

The paper proposes RUMPL, a novel transformer framework with a ray-based 2D keypoint representation that generalizes across arbitrary multi-view setups without retraining.

Findings

01

Reduces MPJPE by up to 53% compared to triangulation.

02

Achieves over 60% improvement over transformer-based baselines.

03

Demonstrates robustness on in-the-wild multi-view and multi-person datasets.

Abstract

Estimating 3D human poses from 2D images remains challenging due to occlusions and projective ambiguity. Multi-view learning-based approaches mitigate these issues but often fail to generalize to real-world scenarios, as large-scale multi-view datasets with 3D ground truth are scarce and captured under constrained conditions. To overcome this limitation, recent methods rely on 2D pose estimation combined with 2D-to-3D pose lifting trained on synthetic data. Building on our previous MPL framework, we propose RUMPL, a transformer-based 3D pose lifter that introduces a 3D ray-based representation of 2D keypoints. This formulation makes the model independent of camera calibration and the number of views, enabling universal deployment across arbitrary multi-view configurations without retraining or fine-tuning. A new View Fusion Transformer leverages learned fused-ray tokens to aggregate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Advanced Vision and Imaging