From Scan to Action: Leveraging Realistic Scans for Embodied Scene Understanding

Anna-Maria Halacheva; Jan-Nico Zaech; Sombit Dey; Luc Van Gool; Danda Pani Paudel

arXiv:2507.17585·cs.CV·July 24, 2025

From Scan to Action: Leveraging Realistic Scans for Embodied Scene Understanding

Anna-Maria Halacheva, Jan-Nico Zaech, Sombit Dey, Luc Van Gool, Danda Pani Paudel

PDF

Open Access

TL;DR

This paper presents a methodology to leverage realistic 3D scene scans for improved scene understanding, enabling applications like scene editing and robotic simulation with high success rates.

Contribution

It introduces a unified annotation framework using USD and strategies to overcome challenges in utilizing real-world scan datasets.

Findings

01

80% success in LLM-based scene editing

02

87% success rate in robotic policy learning

03

Effective integration of diverse scan annotations

Abstract

Real-world 3D scene-level scans offer realism and can enable better real-world generalizability for downstream applications. However, challenges such as data volume, diverse annotation formats, and tool compatibility limit their use. This paper demonstrates a methodology to effectively leverage these scans and their annotations. We propose a unified annotation integration using USD, with application-specific USD flavors. We identify challenges in utilizing holistic real-world scan datasets and present mitigation strategies. The efficacy of our approach is demonstrated through two downstream applications: LLM-based scene editing, enabling effective LLM understanding and adaptation of the data (80% success), and robotic simulation, achieving an 87% success rate in policy learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition