Omni-DNA: A Unified Genomic Foundation Model for Cross-Modal and Multi-Task Learning
Zehui Li, Vallijah Subasri, Yifei Shen, Dongsheng Li, Yiren Zhao,, Guy-Bart Stan, Caihua Shan

TL;DR
Omni-DNA introduces a unified transformer-based model capable of multi-task and cross-modal genomic analysis, reducing the need for separate fine-tuning and enabling complex applications like DNA-to-text and DNA-to-image mapping.
Contribution
The paper presents Omni-DNA, a novel multi-task, cross-modal genomic foundation model that achieves state-of-the-art results and demonstrates broad applicability across diverse genomic tasks.
Findings
Achieves state-of-the-art performance on 18 out of 26 genomic tasks.
Successfully performs multi-task fine-tuning on 10 epigenetic modification tasks.
Demonstrates cross-modal capabilities with DNA-to-text and DNA-to-image tasks.
Abstract
Large Language Models (LLMs) demonstrate remarkable generalizability across diverse tasks, yet genomic foundation models (GFMs) still require separate finetuning for each downstream application, creating significant overhead as model sizes grow. Moreover, existing GFMs are constrained by rigid output formats, limiting their applicability to various genomic tasks. In this work, we revisit the transformer-based auto-regressive models and introduce Omni-DNA, a family of cross-modal multi-task models ranging from 20 million to 1 billion parameters. Our approach consists of two stages: (i) pretraining on DNA sequences with next token prediction objective, and (ii) expanding the multi-modal task-specific tokens and finetuning for multiple downstream tasks simultaneously. When evaluated on the Nucleotide Transformer and GB benchmarks, Omni-DNA achieves state-of-the-art performance on 18 out of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗zehui127/Omni-DNA-116Mmodel· 60 dl60 dl
- 🤗zehui127/Omni-DNA-20Mmodel· 365 dl· ♡ 1365 dl♡ 1
- 🤗zehui127/Omni-DNA-60Mmodel· 55 dl· ♡ 155 dl♡ 1
- 🤗zehui127/Omni-DNA-300Mmodel· 70 dl70 dl
- 🤗zehui127/Omni-DNA-1Bmodel· 84 dl· ♡ 284 dl♡ 2
- 🤗zehui127/Omni-DNA-700Mmodel· 95 dl95 dl
- 🤗zehui127/Omni-DNA-Multitaskmodel· 2 dl2 dl
- 🤗zehui127/Omni-DNA-DNA2Functionmodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing
MethodsAttention Is All You Need · Label Smoothing · Byte Pair Encoding · Layer Normalization · Residual Connection · Dense Connections · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam
