THOM: Generating Physically Plausible Hand-Object Meshes From Text

Uyoung Jeong; Yihalem Yimolal Tiruneh; Hyung Jin Chang; Seungryul Baek; Kwang In Kim

arXiv:2604.02736·cs.CV·April 14, 2026

THOM: Generating Physically Plausible Hand-Object Meshes From Text

Uyoung Jeong, Yihalem Yimolal Tiruneh, Hyung Jin Chang, Seungryul Baek, Kwang In Kim

PDF

TL;DR

THOM is a novel framework that generates physically plausible 3D hand-object meshes directly from text prompts, combining Gaussian-based generation with physics-based refinement for realistic interactions.

Contribution

It introduces a training-free, two-stage pipeline with a new mesh extraction method and physics-guided optimization to produce high-quality, plausible hand-object interactions from text.

Findings

01

THOM achieves high visual realism and physical plausibility in generated HOIs.

02

The framework aligns well with text prompts and produces reliable, interaction-aware meshes.

03

Extensive experiments validate the effectiveness of the approach.

Abstract

Generating photorealistic 3D hand-object interactions (HOIs) from text is important for applications like robotic grasping and AR/VR content creation. In practice, however, achieving both visual fidelity and physical plausibility remains difficult, as mesh extraction from text-generated Gaussians is inherently ill-posed and the resulting meshes are often unreliable for physics-based optimization. We present THOM, a training-free framework that generates physically plausible 3D HOI meshes directly from text prompts, without requiring template object meshes. THOM follows a two-stage pipeline: it first generates hand and object Gaussians guided by text, and then refines their interaction using physics-based optimization. To enable reliable interaction modeling, we introduce a mesh extraction method with an explicit vertex-to-Gaussian mapping, which enables topology-aware regularization. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.