Automating Exploratory Proteomics Research via Language Models
Ning Ding, Shang Qu, Linhai Xie, Yifei Li, Zaoqu Liu, Kaiyan Zhang,, Yibai Xiong, Yuxin Zuo, Zhangren Chen, Ermo Hua, Xingtai Lv, Youbang Sun,, Yang Li, Dong Li, Fuchu He, Bowen Zhou

TL;DR
PROTEUS is an automated system leveraging large language models to analyze proteomics data, generate research hypotheses, and streamline scientific discovery without human intervention, significantly accelerating proteomics research workflows.
Contribution
This paper introduces PROTEUS, a novel AI system that automates proteomics data analysis and hypothesis generation using hierarchical planning and iterative refinement with LLMs.
Findings
PROTEUS generated 191 hypotheses across diverse datasets.
The system's results aligned well with existing literature.
PROTEUS demonstrated reliable and coherent analysis outputs.
Abstract
With the development of artificial intelligence, its contribution to science is evolving from simulating a complex problem to automating entire research processes and producing novel discoveries. Achieving this advancement requires both specialized general models grounded in real-world scientific data and iterative, exploratory frameworks that mirror human scientific methodologies. In this paper, we present PROTEUS, a fully automated system for scientific discovery from raw proteomics data. PROTEUS uses large language models (LLMs) to perform hierarchical planning, execute specialized bioinformatics tools, and iteratively refine analysis workflows to generate high-quality scientific hypotheses. The system takes proteomics datasets as input and produces a comprehensive set of research objectives, analysis results, and novel biological hypotheses without human intervention. We evaluated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetics, Bioinformatics, and Biomedical Research · Scientific Computing and Data Management · Bioinformatics and Genomic Networks
MethodsALIGN · Sparse Evolutionary Training
