TL;DR
InterMesh introduces an explicit interaction-aware framework for multi-person human mesh recovery, significantly improving pose estimation accuracy by incorporating human-object interaction information.
Contribution
It explicitly models human-environment interactions using a human-object interaction detector, enhancing existing human mesh recovery architectures with minimal overhead.
Findings
Reduces MPJPE by 9.9% on CMU Panoptic dataset.
Achieves an 8.2% reduction in MPJPE on Hi4D.
Demonstrates superior performance over state-of-the-art methods on multiple datasets.
Abstract
Humans constantly interact with their surroundings. Existing end-to-end multi-person human mesh recovery methods, typically based on the DETR framework, capture inter-human relationships through self-attention across all human queries. However, these approaches model interactions only implicitly and lack explicit reasoning about how humans interact with objects and with each other. In this paper, we propose InterMesh, a simple yet effective framework that explicitly incorporates human-environment interaction information into human mesh recovery pipeline. By leveraging a human-object interaction detector, InterMesh enriches query representations with structured interaction semantics, enabling more accurate pose and shape estimation. We design lightweight modules, Contextual Interaction Encoder and Interaction-Guided Refiner, to integrate these features into existing HMR architectures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
