LDC: Learning to Generate Research Idea with Dynamic Control
Ruochen Li, Liqiang Jing, Chi Han, Jiawei Zhou, Xinya Du

TL;DR
This paper introduces LDC, a novel two-stage framework combining supervised fine-tuning and controllable reinforcement learning to generate high-quality research ideas that balance novelty, feasibility, and effectiveness.
Contribution
It is the first to integrate multi-dimensional reward models with dynamic control for research idea generation, improving alignment with expert standards.
Findings
Achieved balanced research idea generation with high quality.
Effectively navigated trade-offs among key idea dimensions.
Demonstrated superior performance over existing prompting methods.
Abstract
Recent advancements in large language models (LLMs) have demonstrated their potential in automating the scientific research ideation. Existing approaches primarily focus on prompting techniques, often producing ideas misaligned with expert standards - novelty, feasibility, and effectiveness, which are widely recognized by the research community as the three key subdimensions of high-quality ideas. Also, balancing these dimensions remains challenging due to their inherent trade-offs. To address these limitations, we propose the first framework that employs a two-stage approach combining Supervised Fine-Tuning (SFT) and controllable Reinforcement Learning (RL) for the task. In the SFT stage, the model learns foundational patterns from pairs of research papers and their corresponding follow-up ideas. In the RL stage, multi-dimensional reward models guided by fine-grained feedback evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Teaching and Learning Methods · Education and Critical Thinking Development · Problem and Project Based Learning
MethodsShrink and Fine-Tune
