TL;DR
This paper introduces \\ours, an iterative framework for automatic literature survey generation that mimics human reading, improving coherence, coverage, and integration of multimodal content over traditional one-shot methods.
Contribution
The paper presents an iterative, recurrent outline generation framework with paper cards and a review-refine loop, advancing automatic survey quality and multimodal integration.
Findings
Outperforms state-of-the-art baselines in content coverage and coherence
Enhances survey organization and citation quality
Introduces Survey-Arena benchmark for evaluation
Abstract
Automatic literature survey generation has attracted increasing attention, yet most existing systems follow a one-shot paradigm, where a large set of papers is retrieved at once and a static outline is generated before drafting. This design often leads to noisy retrieval, fragmented structures, and context overload, ultimately limiting survey quality. Inspired by the iterative reading process of human researchers, we propose \ours, a framework based on recurrent outline generation, in which a planning agent incrementally retrieves, reads, and updates the outline to ensure both exploration and coherence. To provide faithful paper-level grounding, we design paper cards that distill each paper into its contributions, methods, and findings, and introduce a review-and-refine loop with visualization enhancement to improve textual flow and integrate multimodal elements such as figures and…
Peer Reviews
Decision·Submitted to ICLR 2026
1. Innovative Iterative Paradigm:The core strength of the work is its departure from the conventional one-shot generation model. By thoughtfully mimicking the iterative reading and writing process of human experts, the proposed framework provides a more natural and effective solution to the complex task of survey generation. 2. Well-Designed Mechanisms for Fidelity and Flow:The introduction of "paper cards" ensures faithful, paper-level grounding of the survey content. Combined with the dedicat
1. Ablation Studies: Ablation experiments are required to validate the effectiveness of the individual modules designed within the workflow. 2. Regarding Evaluation Metrics: In Table 1, several evaluation scores are very close to the baselines. The persuasiveness of the current results is insufficient; it should be tested whether the performance gap remains similarly close if the scale of the scoring metric is widened. 3. Human Alignment: For the LLM-as-a-judge evaluation method, its reliabili
- The integration of paper cards and a review-and-refine loop with visualization enhancements significantly improves textual flow, cross-sectional coherence, and multimodal integration, as evidenced by the experimental results. - The proposed Survey-Arena, a pairwise evaluation benchmark that compares machine-generated surveys with human-written ones, offers a more reliable assessment and effectively addresses the limitations of absolute scoring methods.
My main concerns are the limitations of the paper's coverage and the objectivity of the evaluation. If the author can address my questions, I would be happy to increase the rating: - The retrieval database includes only 680K computer science papers from arXiv, resulting in limited coverage of other disciplines and constraining the framework’s generalization ability to non-CS domains. - The iterative workflow—involving recurrent outline generation and multiple review-and-refine loops—requires re
The paper abandons the static "one-time" planning mode and instead adopts a dynamic, iterative workflow which may iteratively improve the quality of the survey. Besides, injecting multimodal items (table and figure) into a survey is interesting.
1. Lack of efficiency and cost analysis: The iterative framework proposed in the paper, especially the cyclic generation of outlines and multi-round "review and optimization," seems to require a large amount of computational resources and time. However, the paper does not provide a comparative analysis of the efficiency (such as the time required to generate a review) and cost (API usage) between IterSurvey and other baseline methods. 2. The generation details of the "paper card" need to be su
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
