DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Rulin Shao; Akari Asai; Shannon Zejiang Shen; Hamish Ivison; Varsha Kishore; Jingming Zhuo; Xinran Zhao; Molly Park; Samuel G. Finlayson; David Sontag; Tyler Murray; Sewon Min; Pradeep Dasigi; Luca Soldaini; Faeze Brahman; Wen-tau Yih; Tongshuang Wu; Luke Zettlemoyer; Yoon Kim; Hannaneh Hajishirzi; Pang Wei Koh

arXiv:2511.19399·cs.CL·May 18, 2026

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Rulin Shao, Akari Asai, Shannon Zejiang Shen, Hamish Ivison, Varsha Kishore, Jingming Zhuo, Xinran Zhao, Molly Park, Samuel G. Finlayson, David Sontag, Tyler Murray, Sewon Min, Pradeep Dasigi, Luca Soldaini, Faeze Brahman, Wen-tau Yih, Tongshuang Wu, Luke Zettlemoyer, Yoon Kim

PDF

1 Repo 4 Models 3 Datasets

TL;DR

This paper introduces RLER, a reinforcement learning method with evolving rubrics, enabling deep research agents to produce high-quality long-form answers, outperforming existing models in multiple domains.

Contribution

It presents RLER, a novel reinforcement learning approach that co-evolves rubrics with the policy model for improved long-form research capabilities.

Findings

01

DR Tulu-8B outperforms existing open deep research agents by 15.6%.

02

DR Tulu-8B matches or exceeds proprietary agents by 0.7%.

03

It is 1000x cheaper per query than OpenAI DR.

Abstract

Deep research agents perform multi-step research to produce long-form, well-attributed answers. However, most open deep research agents are trained on easily verifiable short-form QA tasks via reinforcement learning with verifiable rewards, which does not extend to realistic long-form tasks. We address this with Reinforcement Learning with Evolving Rubrics (RLER), where rubrics are constructed and maintained to co-evolve with the policy model during training. This allows the rubrics to incorporate newly explored information from search and contrasting model responses, enabling better fact checking and more discriminative on-policy feedback. Using RLER, we develop Deep Research Tulu (DR Tulu-8B), the first fully open model that is directly trained for open-ended, long-form deep research. Across four long-form deep research benchmarks in science, healthcare, and general domains, DR Tulu…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rlresearch/dr-tulu
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Domain Adaptation and Few-Shot Learning · Machine Learning in Healthcare