MMKG-RDS: Reasoning Data Synthesis via Deep Mining of Multimodal Knowledge Graphs
Lun Zhan, Feng Xiong, Huanyong Liu, Feng Zhang, Yuhui Yin

TL;DR
MMKG-RDS is a flexible framework that synthesizes high-quality reasoning data from multimodal knowledge graphs, improving model reasoning and enabling complex benchmark creation.
Contribution
It introduces a novel, customizable reasoning data synthesis framework leveraging multimodal knowledge graphs with fine-grained extraction and quality scoring.
Findings
Fine-tuning Qwen3 models improves reasoning accuracy by 9.2%.
Generated data challenges existing models on complex tasks.
The framework supports diverse domain and task coverage.
Abstract
Synthesizing high-quality training data is crucial for enhancing domain models' reasoning abilities. Existing methods face limitations in long-tail knowledge coverage, effectiveness verification, and interpretability. Knowledge-graph-based approaches still fall short in functionality, granularity, customizability, and evaluation. To address these issues, we propose MMKG-RDS, a flexible framework for reasoning data synthesis that leverages multimodal knowledge graphs. It supports fine-grained knowledge extraction, customizable path sampling, and multidimensional data quality scoring. We validate MMKG-RDS with the MMKG-RDS-Bench dataset, covering five domains, 17 task types, and 14,950 samples. Experimental results show fine-tuning Qwen3 models (0.6B/8B/32B) on a small number of synthesized samples improves reasoning accuracy by 9.2%. The framework also generates distinct data,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling · Graph Theory and Algorithms
