CHAMP: Conformalized 3D Human Multi-Hypothesis Pose Estimators
Harry Zhang, Luca Carlone

TL;DR
CHAMP introduces a probabilistic framework for 3D human pose estimation from 2D keypoints, generating multiple hypotheses and using conformal prediction to select the most accurate ones, achieving state-of-the-art results.
Contribution
The paper presents a novel end-to-end differentiable conformal prediction method for multi-hypothesis 3D human pose estimation, improving accuracy and providing probabilistic guarantees.
Findings
State-of-the-art performance on multiple datasets.
Effective hypothesis scoring and aggregation improves accuracy.
Probabilistic guarantees enhance reliability of pose estimates.
Abstract
We introduce CHAMP, a novel method for learning sequence-to-sequence, multi-hypothesis 3D human poses from 2D keypoints by leveraging a conditional distribution with a diffusion model. To predict a single output 3D pose sequence, we generate and aggregate multiple 3D pose hypotheses. For better aggregation results, we develop a method to score these hypotheses during training, effectively integrating conformal prediction into the learning process. This process results in a differentiable conformal predictor that is trained end2end with the 3D pose estimator. Post-training, the learned scoring model is used as the conformity score, and the 3D pose estimator is combined with a conformal predictor to select the most accurate hypotheses for downstream aggregation. Our results indicate that using a simple mean aggregation on the conformal prediction-filtered hypotheses set yields competitive…
Peer Reviews
Decision·ICLR 2025 Poster
1. The proposed conformal prediction is simple yet effective. 2. This paper is well-written and easy to understand.
1. CP removes the predictions with low confidence and visualisations also show after CP the predictions become more certain and accurate. However, the results are mainly shown with easy cases where occlusions barely present. I am wondering how CP performs on hard cases where occlusions, truncations exist? 2. It will be better if this method can be extended to image-to-3d pipelines.
The application of conformal prediction to 3D human pose estimation is an interesting and original aspect. Multi-hypothesis 3D pose estimation has been a relevant research area due to ambiguities in the task, but practical methods are often not formulated in a proper probabilistic manner. Using conformal prediction techniques is a promising direction and taking a first step in this direction is relevant to the community. The overview of the related works is extensive, and the comparison of them
The method is only evaluated on studio datasets, but not on outdoor ones, such as 3DPW and EMDB. Evaluation on such less restricted videos would be more convincing. It is unclear whether CHAMP's learned conformity scoring performs better than the prior established 2D-joint projection based version (see question below).
1. Learning the uncertainty of multi-hypothesis 3D human pose regression seems to be reasonable. Aside from building the distrubution of 3D human poses, how to select the best ones is an interesting research topic. 2. Designs in scoring the hypotheses via conformal prediction are subtle. Detailed ablation study further reveals the influence of different parameters. 3. The description of method is clear. Especially, Fig. 2 provides a clear overview of the relationship between different desi
1. Experiments. Tab. 1 and Tab. 2 show the improvement, compared with CHAMP-Naive, via introducing comformal prediction (CHAMP) and further selecting the 3D pose via measuring the 2D projection and 2D pose (CHAMP-Agg). However, one important experiment seems to be missing here, CHAMP-Naive + -Agg. How well it performs if the proposed comformal prediction is not used? This experiment will clearly show how important the proposed comformal prediction is. Besides, the quantative eperiments
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition
MethodsDiffusion · Sparse Evolutionary Training
