SpatialTraceGen: High-Fidelity Traces for Efficient VLM Spatial Reasoning Distillation

Gio Huh; Dhruv Sheth; Rayhan Zirvi; Frank Xiao

arXiv:2511.00054·cs.LG·November 4, 2025

SpatialTraceGen: High-Fidelity Traces for Efficient VLM Spatial Reasoning Distillation

Gio Huh, Dhruv Sheth, Rayhan Zirvi, Frank Xiao

PDF

Open Access

TL;DR

SpatialTraceGen creates high-quality, step-by-step reasoning datasets for vision-language models, enabling more efficient spatial reasoning training through automated verification and distillation from large models.

Contribution

The paper introduces SpatialTraceGen, a novel framework that distills reasoning traces from large models into high-quality datasets with automated fidelity verification.

Findings

01

Improves trace quality score by 17%

02

Reduces variance of trace quality by over 40%

03

Provides structured reasoning datasets for better fine-tuning

Abstract

While Vision-Language Models (VLMs) excel in many areas, they struggle with complex spatial reasoning, which requires problem decomposition and strategic tool use. Fine-tuning smaller, more deployable models offers an efficient path to strong performance, but this is hampered by a major bottleneck: the absence of high-quality, step-by-step reasoning data. To address this data-efficiency gap, we introduce SpatialTraceGen, a framework to distill the reasoning processes of a large teacher model into a high-quality dataset of multi-hop, multi-tool reasoning traces. A key innovation is our automated Verifier, which scalably ensures the fidelity of each reasoning step, providing a cost-effective alternative to manual human annotation. On the CLEVR-Humans benchmark, this verifier-guided process improves the average quality score of traces by 17\% while reducing quality variance by over 40\%.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Topic Modeling