From Task Executors to Research Partners: Evaluating AI Co-Pilots Through Workflow Integration in Biomedical Research
Lukas Weidener, Marko Brki\'c, Chiara Bacci, Mihailo Jovanovi\'c, Emre Ulgac, Alex Dobrin, Johannes Weniger, Martin Vlas, Ritvik Singh, Aakaash Meduri

TL;DR
This paper reviews current benchmarking practices for AI in biomedical research, highlighting their limitations in assessing AI as integrated research partners and proposing a new evaluation framework emphasizing workflow and collaboration.
Contribution
It introduces a process-oriented evaluation framework that captures workflow integration, dialogue, and researcher experience, addressing gaps in existing benchmarks.
Findings
Current benchmarks assess isolated AI capabilities only.
AI systems often lack contextual memory and adaptive dialogue.
Proposed framework evaluates AI as collaborative research partners.
Abstract
Artificial intelligence systems are increasingly deployed in biomedical research. However, current evaluation frameworks may inadequately assess their effectiveness as research collaborators. This rapid review examines benchmarking practices for AI systems in preclinical biomedical research. Three major databases and two preprint servers were searched from January 1, 2018 to October 31, 2025, identifying 14 benchmarks that assess AI capabilities in literature understanding, experimental design, and hypothesis generation. The results revealed that all current benchmarks assess isolated component capabilities, including data analysis quality, hypothesis validity, and experimental protocol design. However, authentic research collaboration requires integrated workflows spanning multiple sessions, with contextual memory, adaptive dialogue, and constraint propagation. This gap implies that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Artificial Intelligence in Healthcare and Education · Cell Image Analysis Techniques
