VLM-Guided Group Preference Alignment for Diffusion-based Human Mesh Recovery

Wenhao Shen; Hao Wang; Wanqi Yin; Fayao Liu; Xulei Yang; Chao Liang; Zhongang Cai; Guosheng Lin

arXiv:2602.19180·cs.CV·February 24, 2026

VLM-Guided Group Preference Alignment for Diffusion-based Human Mesh Recovery

Wenhao Shen, Hao Wang, Wanqi Yin, Fayao Liu, Xulei Yang, Chao Liang, Zhongang Cai, Guosheng Lin

PDF

Open Access

TL;DR

This paper introduces a novel group preference alignment framework that improves diffusion-based human mesh recovery by using a critique agent and preference dataset to produce more accurate and physically plausible 3D human meshes from images.

Contribution

It proposes a dual-memory critique agent with self-reflection and a group preference alignment method to enhance diffusion-based HMR models.

Findings

01

Achieves superior accuracy over state-of-the-art methods.

02

Produces more physically plausible and image-consistent meshes.

03

Effectively handles occlusion and cluttered scenes.

Abstract

Human mesh recovery (HMR) from a single RGB image is inherently ambiguous, as multiple 3D poses can correspond to the same 2D observation. Recent diffusion-based methods tackle this by generating various hypotheses, but often sacrifice accuracy. They yield predictions that are either physically implausible or drift from the input image, especially under occlusion or in cluttered, in-the-wild scenes. To address this, we introduce a dual-memory augmented HMR critique agent with self-reflection to produce context-aware quality scores for predicted meshes. These scores distill fine-grained cues about 3D human motion structure, physical feasibility, and alignment with the input image. We use these scores to build a group-wise HMR preference dataset. Leveraging this dataset, we propose a group preference alignment framework for finetuning diffusion-based HMR models. This process injects the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis