COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation   Tasks and Language Models

Yuchen Ren; Wenwei Han; Qianyuan Zhang; Yining Tang; Weiqiang Bai,; Yuchen Cai; Lifeng Qiao; Hao Jiang; Dong Yuan; Tao Chen; Siqi Sun; Pan Tan,; Wanli Ouyang; Nanqing Dong; Xinzhu Ma; Peng Ye

arXiv:2412.10347·q-bio.BM·December 16, 2024

COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models

Yuchen Ren, Wenwei Han, Qianyuan Zhang, Yining Tang, Weiqiang Bai,, Yuchen Cai, Lifeng Qiao, Hao Jiang, Dong Yuan, Tao Chen, Siqi Sun, Pan Tan,, Wanli Ouyang, Nanqing Dong, Xinzhu Ma, Peng Ye

PDF

TL;DR

COMET is the first comprehensive benchmark designed to evaluate machine learning models across diverse biological multi-omics tasks, facilitating better model selection and advancing integrated biological data analysis.

Contribution

It introduces a new multi-omics benchmark with curated datasets and evaluates existing language models, providing insights into their effectiveness for complex biological data integration.

Findings

01

Existing models show varied performance across omics tasks.

02

Multi-omics models outperform single-omics models in integration tasks.

03

Benchmark highlights key challenges and future directions in multi-omics research.

Abstract

As key elements within the central dogma, DNA, RNA, and proteins play crucial roles in maintaining life by guaranteeing accurate genetic expression and implementation. Although research on these molecules has profoundly impacted fields like medicine, agriculture, and industry, the diversity of machine learning approaches-from traditional statistical methods to deep learning models and large language models-poses challenges for researchers in choosing the most suitable models for specific tasks, especially for cross-omics and multi-omics tasks due to the lack of comprehensive benchmarks. To address this, we introduce the first comprehensive multi-omics benchmark COMET (Benchmark for Biological COmprehensive Multi-omics Evaluation Tasks and Language Models), designed to evaluate models across single-omics, cross-omics, and multi-omics tasks. First, we curate and develop a diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.