Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework
Zhaorui Yang, Bo Pan, Han Wang, Yiyao Wang, Xingyu Liu, Luoxuan Weng, Yingchaojie Feng, Haozhe Feng, Minfeng Zhu, Bo Zhang, Wei Chen

TL;DR
This paper introduces Multimodal DeepResearcher, a framework enabling LLMs to generate integrated text and visualization reports from scratch, addressing the challenge of combining informative visualizations with textual content.
Contribution
It proposes FDV for structured visualization description and a four-stage agentic framework for multimodal report generation, advancing automated research report creation.
Findings
Achieves 82% win rate over baseline with Claude 3.7 Sonnet.
Develops MultimodalReportBench with 100 topics and 5 metrics.
Demonstrates effective integration of text and visualizations in reports.
Abstract
Visualizations play a crucial part in effective communication of concepts and information. Recent advances in reasoning and retrieval augmented generation have enabled Large Language Models (LLMs) to perform deep research and generate comprehensive reports. Despite its progress, existing deep research frameworks primarily focus on generating text-only content, leaving the automated generation of interleaved texts and visualizations underexplored. This novel task poses key challenges in designing informative visualizations and effectively integrating them with text reports. To address these challenges, we propose Formal Description of Visualization (FDV), a structured textual representation of charts that enables LLMs to learn from and generate diverse, high-quality visualizations. Building on this representation, we introduce Multimodal DeepResearcher, an agentic framework that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsService-Oriented Architecture and Web Services · Semantic Web and Ontologies · Advanced Text Analysis Techniques
MethodsFocus
