Advancing AI Research Assistants with Expert-Involved Learning

Tianyu Liu; Simeng Han; Hanchen Wang; Xiao Luo; Pan Lu; Biqing Zhu; Yuge Wang; Keyi Li; Jiapeng Chen; Rihao Qu; Yufeng Liu; Xinyue Cui; Aviv Yaish; Yuhang Chen; Minsheng Hao; Chuhan Li; Kexing Li; Yinsheng Lu; Xinyu Wei; Qinzhe Xing; Antonia Panescu; Mengbo Wang; Vibha Annaswamy; Alicia Sanchez; Jack Cloherty; Arman Cohan; Hua Xu; Mark Gerstein; James Zou; Hongyu Zhao

arXiv:2505.04638·cs.AI·April 8, 2026

Advancing AI Research Assistants with Expert-Involved Learning

Tianyu Liu, Simeng Han, Hanchen Wang, Xiao Luo, Pan Lu, Biqing Zhu, Yuge Wang, Keyi Li, Jiapeng Chen, Rihao Qu, Yufeng Liu, Xinyue Cui, Aviv Yaish, Yuhang Chen, Minsheng Hao, Chuhan Li, Kexing Li, Yinsheng Lu, Xinyu Wei, Qinzhe Xing, Antonia Panescu, Mengbo Wang, Vibha Annaswamy

PDF

TL;DR

ARIEL is an open-source framework that evaluates and improves biomedical AI assistants by integrating expert-vetted tasks, revealing current model limitations and enhancing capabilities through prompt engineering and fine-tuning.

Contribution

The paper introduces ARIEL, a comprehensive platform for assessing and optimizing biomedical AI models with expert-involved learning and multimodal evaluation.

Findings

01

State-of-the-art models produce fluent but incomplete summaries.

02

LMMs face challenges with detailed visual reasoning.

03

Prompt engineering and fine-tuning improve model performance.

Abstract

Large language models (LLMs) and large multimodal models (LMMs) promise to accelerate biomedical discovery, yet their reliability remains unclear. We introduce ARIEL (AI Research Assistant for Expert-in-the-Loop Learning), an open-source evaluation and optimization framework that pairs a curated multimodal biomedical corpus with expert-vetted tasks to probe two capabilities: full-length article summarization and fine-grained figure interpretation. Using uniform protocols and blinded PhD-level evaluation, we find that state-of-the-art models generate fluent but incomplete summaries, whereas LMMs struggle with detailed visual reasoning. We later observe that prompt engineering and lightweight fine-tuning substantially improve textual coverage, and a compute-scaled inference strategy enhances visual question answering. We build an ARIEL agent that integrates textual and visual cues, and we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.