Socratic-Geo: Synthetic Data Generation and Geometric Reasoning via Multi-Agent Interaction
Zhengbo Jiao, Shaobo Wang, Zifan Zhang, Wei Wang, Bing Zhao, Hu Wei, Linfeng Zhang

TL;DR
Socratic-Geo introduces an autonomous multi-agent framework that enhances geometric reasoning and synthetic data generation for vision-language models, significantly improving performance with less data.
Contribution
It presents a novel multi-agent system coupling data synthesis with model learning, improving geometric reasoning and image generation for vision-language tasks.
Findings
Achieves 49.11 on six benchmarks with only a quarter of baseline data.
Sets new state-of-the-art for open-source models on GenExam with 42.4%.
Surpasses previous models like Seedream-4.0 and approaches Gemini-2.5-Flash-Image.
Abstract
Multimodal Large Language Models (MLLMs) have significantly advanced vision-language understanding. However, even state-of-the-art models struggle with geometric reasoning, revealing a critical bottleneck: the extreme scarcity of high-quality image-text pairs. Human annotation is prohibitively expensive, while automated methods fail to ensure fidelity and training effectiveness. Existing approaches either passively adapt to available images or employ inefficient random exploration with filtering, decoupling generation from learning needs. We propose Socratic-Geo, a fully autonomous framework that dynamically couples data synthesis with model learning through multi-agent interaction. The Teacher agent generates parameterized Python scripts with reflective feedback (Reflect for solvability, RePI for visual validity), ensuring image-text pair purity. The Solver agent optimizes reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Artificial Intelligence Applications
