Training-Free Dense Hand Contact Estimation with Multi-Modal Large Language Models

Daniel Sungho Jung; Kyoung Mu Lee

arXiv:2605.05886·cs.CV·May 8, 2026

Training-Free Dense Hand Contact Estimation with Multi-Modal Large Language Models

Daniel Sungho Jung, Kyoung Mu Lee

PDF

TL;DR

This paper introduces ContactPrompt, a training-free, zero-shot method leveraging multi-modal large language models for dense hand contact estimation, combining semantic understanding with geometric reasoning.

Contribution

It proposes a novel structured approach that encodes 3D hand geometry and performs multi-stage contact reasoning without training, outperforming supervised methods.

Findings

01

Outperforms previous supervised dense contact estimation methods

02

Uses structured hand-part segmentation and vertex-grid representation

03

Enables precise dense contact prediction without training

Abstract

Dense hand contact estimation requires both high-level semantic understanding and fine-grained geometric reasoning of human interaction to accurately localize contact regions. Recently, multi-modal large language models (MLLMs) have demonstrated strong capabilities in understanding visual semantics, enabled by vision-language priors learned from large-scale data. However, leveraging MLLMs for dense hand contact estimation remains underexplored. There are two major challenges in applying MLLMs to dense hand contact estimation. First, encoding explicit 3D hand geometry is difficult, as MLLMs primarily operate on vision and language modalities. Second, capturing fine-grained vertex-level contact remains challenging, as MLLMs tend to focus on high-level semantics rather than detailed geometric reasoning. To address these challenges, we propose ContactPrompt, a training-free and zero-shot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.