TL;DR
This paper introduces a novel multi-modal hypergraph reasoning framework that combines semantic, geometric, and pose cues with contrastive learning to improve 3D crowd mesh recovery under occlusion.
Contribution
It proposes a hypergraph-based approach with contrastive learning to better model crowd dynamics and fuse multi-modal features for 3D reconstruction.
Findings
Achieves state-of-the-art results on Panoptic and GigaCrowd benchmarks.
Effectively handles severe occlusions and depth ambiguities.
Demonstrates the benefit of hypergraph modeling and contrastive learning in crowd reconstruction.
Abstract
Multi-person 3D reconstruction is pivotal for real-world interaction analysis, yet remains challenging due to severe occlusions and depth ambiguity. Current approaches typically rely on single-modality inputs, which inherently lack geometric guidance. Furthermore, these methods often reconstruct subjects in isolation, neglecting the collective group context essential for resolving ambiguities in crowded scenes. To address these limitations, we propose Contrastive Multi-modal Hypergraph Reasoning to synergize semantic, geometric, and pose cues for crowd reconstruction. We first initialize robust node representations by combining RGB features, geometric priors, and occlusion-aware incomplete poses. Additionally, we introduce a pelvis depth indicator as a global spatial anchor, aligning visual features with a metric-scale-agnostic depth ordering. Subsequently, we construct a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
