OUTLINEFORGE: Hierarchical Reinforcement Learning with Explicit States for Scientific Writing
Yilin Bao, Ziyao He, Zayden Yang

TL;DR
This paper introduces OUTLINEFORGE, a hierarchical reinforcement learning framework for scientific paper generation that improves global structure, factual accuracy, and citation consistency through structured outline planning and a two-stage optimization process.
Contribution
It presents a novel hierarchical RL approach with explicit state modeling and a new benchmark for evaluating scientific writing quality.
Findings
Improved long-range structural coherence over baselines.
Enhanced citation reliability and factual accuracy.
Effective global planning in scientific document generation.
Abstract
Scientific paper generation requires document-level planning and factual grounding, but current large language models, despite their strong local fluency, often fail in global structure, input coverage, and citation consistency. We present a reinforcement learning framework that casts scientific outline construction as a long-horizon planning problem over hierarchical document structures. Our approach models edit evolving outlines through structured actions, enabling the system to incrementally build a complete scientific manuscript. To support effective and stabilize learning,we introduce a two-stage optimization procedure consisting of (i) backward outline reconstruction from partial plans to enforce global structural consistency, and (ii) forward value-guided reinforcement learning with rewards explicitly modeling scientific correctness, discourse coherence, and citation fidelity. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Advanced Text Analysis Techniques
