Deep Ideation: Designing LLM Agents to Generate Novel Research Ideas on Scientific Concept Network
Keyu Zhao, Weiquan Lin, Qirui Zheng, Fengli Xu, Yong Li

TL;DR
This paper introduces Deep Ideation, a framework that enhances LLM-generated research ideas by integrating scientific concept networks and iterative refinement, leading to more novel and feasible scientific concepts.
Contribution
The paper presents a novel framework combining scientific networks and an explore-expand-evolve workflow to improve LLM-based research ideation, surpassing previous simplistic methods.
Findings
Improved idea quality by 10.67% over existing methods
Generated ideas exceed top conference acceptance levels
Ablation studies validate each component's effectiveness
Abstract
Novel research ideas play a critical role in advancing scientific inquiries. Recent advancements in Large Language Models (LLMs) have demonstrated their potential to generate novel research ideas by leveraging large-scale scientific literature. However, previous work in research ideation has primarily relied on simplistic methods, such as keyword co-occurrence or semantic similarity. These approaches focus on identifying statistical associations in the literature but overlook the complex, contextual relationships between scientific concepts, which are essential to effectively leverage knowledge embedded in human literature. For instance, papers that simultaneously mention "keyword A" and "keyword B" often present research ideas that integrate both concepts. Additionally, some LLM-driven methods propose and refine research ideas using the model's internal knowledge, but they fail to…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The paper address an intereateing and important area of the LLM-driven research-idea generation - Large corpus (100K papers) for the concept network, with modular design of explore / expand / evolve with critic feedback loop - Clear visual pipeline for fig 1-2, easy to follow with good presence. - Consistent experiment reporting across 4 domains with intent and structured scoring rubrics. The evaluation model reported is comprehensive. - Ablations without critic show the contribution of the cr
- Similar concept networks and iterative LLM ideation already exist. The method-wise novelty is somehow limited to incremental algorithmic refinements rather than conceptual advances. - The critic is trained on LLM-generated paper–review pairs rather than real human reviews. Although an ablation (with vs. without critic) is included, the small (3–5%) improvement is measured only by LLM judges without correlation or reliability analysis. Also how to prevent the data contamination and model checkp
1. The core innovation lies in integrating a comprehensive scientific concept network that incorporates contextual relationships between keywords, providing a richer, more grounded foundation for LLM ideation than previous statistical methods. 2. The explore-expand-evolve workflow, combined with an Idea Stack and keyword management modules, allows for a structured and continuous optimization of research ideas, mirroring the cognitive process of human researchers. 3. The introduction of a Critic
1. The formal definition of the edge feature $F_{ij}$ (contextual relationship) is high-level ($g(\cdot)$ aggregating relation)9999999. The paper lacks detail on how this complex contextual information is practically represented, quantified, and distilled into a format that the LLM agent can robustly query and leverage for "Relation Analysis" beyond just raw text snippets. 2. The Critic Model is fine-tuned on a relatively small dataset (4278 examples). Its ability to consistently and accurately
1. The paper collects approximately 100k papers from 10 major AI conferences to create such co-occurrence keywords. Finally, the paper released a review dataset based on real-world reviewer feedback. 2. The paper includes an ablation study and a case study. The paper also includes several SOTA baselines. The paper conducts both human and automatic evaluation. The paper provides both quantitative and qualitative analysis. 3. The paper includes additional implementation details in the appendix.
1. Why use a co-occurrence concept graph instead of using scientific IE systems to extract keywords and their relationships? The framework seems to be purely based on prompting. The idea of using neighboring keywords has been proposed in ResearchAgent and SciMon. The paper might need to include a domain-specific LLM such as OLMO2. 2. Some details of the paper are also not clear. For the method section, the evolve part is really confusing. What is the exact algorithm? How to determine whether the
(1) The work moves beyond static keyword or embedding-based ideation by dynamically retrieving and composing concept relations from a curated scientific network. (2) Fine-tuning on real review text gives a realistic feedback signal for novelty and feasibility assessment. (3) The released concept network and dataset could support further AI-for-Science research.
(1) The paper does not analyze whether the training corpus, review data, and accepted-paper references pre- or post-date the LLMs’ training cut-off—important for judging fairness and novelty. (2) Section 4.2.2 lists 54 researchers but omits selection criteria, expertise, or calibration details. (3) Novelty/feasibility scores are subjective; the paper reports no inter-rater agreement between human evaluators, nor consistency among different LLM judges. (4) Only Evolve and Critic modules are te
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling · Biomedical Text Mining and Ontologies
