AI Can Learn Scientific Taste

Jingqi Tong; Mingzhe Li; Hangcheng Li; Yongzhuo Yang; Yurong Mou; Weijie Ma; Zhiheng Xi; Hongji Chen; Xiaoran Liu; Qinyuan Cheng; Ming Zhang; Qiguang Chen; Weifeng Ge; Qipeng Guo; Tianlei Ying; Tianxiang Sun; Yining Zheng; Xinchi Chen; Jun Zhao; Ning Ding; Xuanjing Huang; Yugang Jiang; Xipeng Qiu

arXiv:2603.14473·cs.CL·March 17, 2026

AI Can Learn Scientific Taste

Jingqi Tong, Mingzhe Li, Hangcheng Li, Yongzhuo Yang, Yurong Mou, Weijie Ma, Zhiheng Xi, Hongji Chen, Xiaoran Liu, Qinyuan Cheng, Ming Zhang, Qiguang Chen, Weifeng Ge, Qipeng Guo, Tianlei Ying, Tianxiang Sun, Yining Zheng, Xinchi Chen, Jun Zhao, Ning Ding, Xuanjing Huang

PDF

Open Access 4 Models 2 Datasets

TL;DR

This paper introduces a reinforcement learning framework enabling AI to develop scientific taste by learning from community feedback, leading to proposals with higher potential impact and surpassing state-of-the-art language models.

Contribution

It presents a novel training paradigm, RLCF, for teaching AI scientific taste through preference modeling and alignment, a previously underexplored area.

Findings

01

Scientific Judge outperforms SOTA LLMs in preference tasks.

02

Scientific Thinker proposes higher-impact research ideas.

03

Model generalizes across fields and future years.

Abstract

Great scientists have strong judgement and foresight, closely tied to what we call scientific taste. Here, we use the term to refer to the capacity to judge and propose research ideas with high potential impact. However, most relative research focuses on improving an AI scientist's executive capability, while enhancing an AI's scientific taste remains underexplored. In this work, we propose Reinforcement Learning from Community Feedback (RLCF), a training paradigm that uses large-scale community signals as supervision, and formulate scientific taste learning as a preference modeling and alignment problem. For preference modeling, we train Scientific Judge on 700K field- and time-matched pairs of high- vs. low-citation papers to judge ideas. For preference alignment, using Scientific Judge as a reward model, we train a policy model, Scientific Thinker, to propose research ideas with high…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Expert finding and Q&A systems · Ethics and Social Impacts of AI