Spatial Knowledge Graph-Guided Multimodal Synthesis
Yida Xue, Zhen Bi, Jinnan Yang, Jungang Lou, Kehai Chen, Min Zhang, Huajun Chen, Ningyu Zhang

TL;DR
This paper introduces SKG2DATA, a novel framework that uses spatial knowledge graphs to guide multimodal data synthesis, improving spatial perception in large language models by generating spatially coherent images and descriptions.
Contribution
The paper presents a systematic, knowledge-to-data synthesis approach guided by spatial knowledge graphs, enabling scalable, diverse, and spatially coherent multimodal data generation.
Findings
Enhanced spatial reasoning in MLLMs after training with synthesized data
Scalable automated construction of spatial knowledge graphs
Improved spatial perception with minimal impact on general capabilities
Abstract
Recent advances in Multimodal Large Language Models (MLLMs) have significantly enhanced their capabilities; however, their spatial perception abilities remain a notable limitation. To address this challenge, multimodal data synthesis offers a promising solution. Yet, ensuring that synthesized data adhere to spatial common sense is a non-trivial task. Our approach addresses this critical gap by providing a systematic framework for generating spatially coherent data. In this work, we introduce SKG2DATA, a novel multimodal synthesis approach guided by spatial knowledge graphs, grounded in the concept of knowledge-to-data generation. SKG2DATA employs an automated pipeline for constructing Spatial Knowledge Graph (SKG) that effectively captures human-like spatial cognition, including directional and distance relationships. These structured representations then serve as precise guidance for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Spatial Cognition and Navigation · Constraint Satisfaction and Optimization
