Deep Ideation: Designing LLM Agents to Generate Novel Research Ideas on Scientific Concept Network

Keyu Zhao; Weiquan Lin; Qirui Zheng; Fengli Xu; Yong Li

arXiv:2511.02238·cs.AI·November 5, 2025

Deep Ideation: Designing LLM Agents to Generate Novel Research Ideas on Scientific Concept Network

Keyu Zhao, Weiquan Lin, Qirui Zheng, Fengli Xu, Yong Li

PDF

Open Access 4 Reviews

TL;DR

This paper introduces Deep Ideation, a framework that enhances LLM-generated research ideas by integrating scientific concept networks and iterative refinement, leading to more novel and feasible scientific concepts.

Contribution

The paper presents a novel framework combining scientific networks and an explore-expand-evolve workflow to improve LLM-based research ideation, surpassing previous simplistic methods.

Findings

01

Improved idea quality by 10.67% over existing methods

02

Generated ideas exceed top conference acceptance levels

03

Ablation studies validate each component's effectiveness

Abstract

Novel research ideas play a critical role in advancing scientific inquiries. Recent advancements in Large Language Models (LLMs) have demonstrated their potential to generate novel research ideas by leveraging large-scale scientific literature. However, previous work in research ideation has primarily relied on simplistic methods, such as keyword co-occurrence or semantic similarity. These approaches focus on identifying statistical associations in the literature but overlook the complex, contextual relationships between scientific concepts, which are essential to effectively leverage knowledge embedded in human literature. For instance, papers that simultaneously mention "keyword A" and "keyword B" often present research ideas that integrate both concepts. Additionally, some LLM-driven methods propose and refine research ideas using the model's internal knowledge, but they fail to…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 4

Strengths

- The paper address an intereateing and important area of the LLM-driven research-idea generation - Large corpus (100K papers) for the concept network, with modular design of explore / expand / evolve with critic feedback loop - Clear visual pipeline for fig 1-2, easy to follow with good presence. - Consistent experiment reporting across 4 domains with intent and structured scoring rubrics. The evaluation model reported is comprehensive. - Ablations without critic show the contribution of the cr

Weaknesses

- Similar concept networks and iterative LLM ideation already exist. The method-wise novelty is somehow limited to incremental algorithmic refinements rather than conceptual advances. - The critic is trained on LLM-generated paper–review pairs rather than real human reviews. Although an ablation (with vs. without critic) is included, the small (3–5%) improvement is measured only by LLM judges without correlation or reliability analysis. Also how to prevent the data contamination and model checkp

Reviewer 02Rating 6Confidence 5

Strengths

1. The core innovation lies in integrating a comprehensive scientific concept network that incorporates contextual relationships between keywords, providing a richer, more grounded foundation for LLM ideation than previous statistical methods. 2. The explore-expand-evolve workflow, combined with an Idea Stack and keyword management modules, allows for a structured and continuous optimization of research ideas, mirroring the cognitive process of human researchers. 3. The introduction of a Critic

Weaknesses

1. The formal definition of the edge feature $F_{ij}$ (contextual relationship) is high-level ($g(\cdot)$ aggregating relation)9999999. The paper lacks detail on how this complex contextual information is practically represented, quantified, and distilled into a format that the LLM agent can robustly query and leverage for "Relation Analysis" beyond just raw text snippets. 2. The Critic Model is fine-tuned on a relatively small dataset (4278 examples). Its ability to consistently and accurately

Reviewer 03Rating 2Confidence 4

Strengths

1. The paper collects approximately 100k papers from 10 major AI conferences to create such co-occurrence keywords. Finally, the paper released a review dataset based on real-world reviewer feedback. 2. The paper includes an ablation study and a case study. The paper also includes several SOTA baselines. The paper conducts both human and automatic evaluation. The paper provides both quantitative and qualitative analysis. 3. The paper includes additional implementation details in the appendix.

Weaknesses

1. Why use a co-occurrence concept graph instead of using scientific IE systems to extract keywords and their relationships? The framework seems to be purely based on prompting. The idea of using neighboring keywords has been proposed in ResearchAgent and SciMon. The paper might need to include a domain-specific LLM such as OLMO2. 2. Some details of the paper are also not clear. For the method section, the evolve part is really confusing. What is the exact algorithm? How to determine whether the

Reviewer 04Rating 2Confidence 4

Strengths

(1) The work moves beyond static keyword or embedding-based ideation by dynamically retrieving and composing concept relations from a curated scientific network. (2) Fine-tuning on real review text gives a realistic feedback signal for novelty and feasibility assessment. (3) The released concept network and dataset could support further AI-for-Science research.

Weaknesses

(1) The paper does not analyze whether the training corpus, review data, and accepted-paper references pre- or post-date the LLMs’ training cut-off—important for judging fairness and novelty. (2) Section 4.2.2 lists 54 researchers but omits selection criteria, expertise, or calibration details. (3) Novelty/feasibility scores are subjective; the paper reports no inter-rater agreement between human evaluators, nor consistency among different LLM judges. (4) Only Evolve and Critic modules are te

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Topic Modeling · Biomedical Text Mining and Ontologies