OmicsLM: A Multimodal Large Language Model for Multi-Sample Omics Reasoning
Maciej Sypetkowski, Joanna Krawczyk, {\L}ukasz Smoli\'nski, Remigiusz Kinas, Przemys{\l}aw Pietrzak, Tomasz Jetka, Rafa{\l} Powalski

TL;DR
OmicsLM is a multimodal large language model that integrates quantitative transcriptomic data with natural language processing to enhance biological reasoning and question answering across multiple samples.
Contribution
It introduces a novel multimodal LLM that combines omics data with natural language, trained on extensive instruction-following examples for diverse biological tasks.
Findings
OmicsLM performs comparably to specialized models on profile-level tasks.
It outperforms both specialized omics models and general LLMs on language-guided biological reasoning.
The model enables direct use of expression profiles for complex biological question answering.
Abstract
Interpreting transcriptomic data is one of the most common analytical tasks in modern biology. Yet most current models either consume expression profiles without producing natural-language biological explanations, or reason in language without direct access to quantitative omics measurements. We introduce OmicsLM, a multimodal LLM that connects quantitative omics profiles with natural-language biological tasks. OmicsLM represents each transcriptomic profile as a compact continuous representation within the LLM context. This interface preserves quantitative expression signal while allowing natural-language instructions, explicit gene mentions, and multiple interleaved biological samples to be processed together in one model context. We train OmicsLM on more than 5.5 million instruction-following examples spanning over 70 task types, combining continuous transcriptomic inputs,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
