Deep Literature Survey Automation with an Iterative Workflow

Hongbo Zhang; Han Cui; Yidong Wang; Yijian Tian; Qi Guo; Cunxiang Wang; Jian Wu; Chiyu Song; Yue Zhang

arXiv:2510.21900·cs.CL·October 28, 2025

Deep Literature Survey Automation with an Iterative Workflow

Hongbo Zhang, Han Cui, Yidong Wang, Yijian Tian, Qi Guo, Cunxiang Wang, Jian Wu, Chiyu Song, Yue Zhang

PDF

3 Reviews

TL;DR

This paper introduces \\ours, an iterative framework for automatic literature survey generation that mimics human reading, improving coherence, coverage, and integration of multimodal content over traditional one-shot methods.

Contribution

The paper presents an iterative, recurrent outline generation framework with paper cards and a review-refine loop, advancing automatic survey quality and multimodal integration.

Findings

01

Outperforms state-of-the-art baselines in content coverage and coherence

02

Enhances survey organization and citation quality

03

Introduces Survey-Arena benchmark for evaluation

Abstract

Automatic literature survey generation has attracted increasing attention, yet most existing systems follow a one-shot paradigm, where a large set of papers is retrieved at once and a static outline is generated before drafting. This design often leads to noisy retrieval, fragmented structures, and context overload, ultimately limiting survey quality. Inspired by the iterative reading process of human researchers, we propose \ours, a framework based on recurrent outline generation, in which a planning agent incrementally retrieves, reads, and updates the outline to ensure both exploration and coherence. To provide faithful paper-level grounding, we design paper cards that distill each paper into its contributions, methods, and findings, and introduce a review-and-refine loop with visualization enhancement to improve textual flow and integrate multimodal elements such as figures and…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

1. Innovative Iterative Paradigm:The core strength of the work is its departure from the conventional one-shot generation model. By thoughtfully mimicking the iterative reading and writing process of human experts, the proposed framework provides a more natural and effective solution to the complex task of survey generation. 2. Well-Designed Mechanisms for Fidelity and Flow:The introduction of "paper cards" ensures faithful, paper-level grounding of the survey content. Combined with the dedicat

Weaknesses

1. Ablation Studies: Ablation experiments are required to validate the effectiveness of the individual modules designed within the workflow. 2. Regarding Evaluation Metrics: In Table 1, several evaluation scores are very close to the baselines. The persuasiveness of the current results is insufficient; it should be tested whether the performance gap remains similarly close if the scale of the scoring metric is widened. 3. Human Alignment: For the LLM-as-a-judge evaluation method, its reliabili

Reviewer 02Rating 4Confidence 4

Strengths

- The integration of paper cards and a review-and-refine loop with visualization enhancements significantly improves textual flow, cross-sectional coherence, and multimodal integration, as evidenced by the experimental results. - The proposed Survey-Arena, a pairwise evaluation benchmark that compares machine-generated surveys with human-written ones, offers a more reliable assessment and effectively addresses the limitations of absolute scoring methods.

Weaknesses

My main concerns are the limitations of the paper's coverage and the objectivity of the evaluation. If the author can address my questions, I would be happy to increase the rating: - The retrieval database includes only 680K computer science papers from arXiv, resulting in limited coverage of other disciplines and constraining the framework’s generalization ability to non-CS domains. - The iterative workflow—involving recurrent outline generation and multiple review-and-refine loops—requires re

Reviewer 03Rating 6Confidence 3

Strengths

The paper abandons the static "one-time" planning mode and instead adopts a dynamic, iterative workflow which may iteratively improve the quality of the survey. Besides, injecting multimodal items (table and figure) into a survey is interesting.

Weaknesses

1. Lack of efficiency and cost analysis: The iterative framework proposed in the paper, especially the cyclic generation of outlines and multi-round "review and optimization," seems to require a large amount of computational resources and time. However, the paper does not provide a comparative analysis of the efficiency (such as the time required to generate a review) and cost (API usage) between IterSurvey and other baseline methods. 2. The generation details of the "paper card" need to be su

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.