GRAFT: Geometric Refinement and Fitting Transformer for Human Scene Reconstruction

Pradyumna YM; Yuxuan Xue; Yue Chen; Nikita Kister; Istv\'an S\'ar\'andi; Gerard Pons-Moll

arXiv:2604.19624·cs.CV·April 22, 2026

GRAFT: Geometric Refinement and Fitting Transformer for Human Scene Reconstruction

Pradyumna YM, Yuxuan Xue, Yue Chen, Nikita Kister, Istv\'an S\'ar\'andi, Gerard Pons-Moll

PDF

1 Repo

TL;DR

GRAFT is a transformer-based method that efficiently refines 3D human-scene interaction reconstructions from a single image, combining accuracy and speed.

Contribution

It introduces a learned prior that predicts interaction gradients, enabling fast, iterative refinement of human meshes with scene reasoning, applicable as an end-to-end or plug-and-play approach.

Findings

01

GRAFT improves interaction quality by up to 113% over state-of-the-art feed-forward methods.

02

It matches optimization-based methods' quality at approximately 50 times lower runtime.

03

It generalizes well to in-the-wild multi-person scenes and is preferred in 64.8% of user studies.

Abstract

Reconstructing physically plausible 3D human-scene interactions (HSI) from a single image currently presents a trade-off: optimization based methods offer accurate contact but are slow (~20s), while feed-forward approaches are fast yet lack explicit interaction reasoning, producing floating and interpenetration artifacts. Our key insight is that geometry-based human--scene fitting can be amortized into fast feed-forward inference. We present GRAFT (Geometric Refinement And Fitting Transformer), a learned HSI prior that predicts Interaction Gradients: corrective parameter updates that iteratively refine human meshes by reasoning about their 3D relationship to the surrounding scene. GRAFT encodes the interaction state into compact body-anchored tokens, each grounded in the scene geometry via Geometric Probes that capture spatial relationships with nearby surfaces. A lightweight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://pradyumnaym.github.io/graft
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.