Life-Code: Central Dogma Modeling with Multi-Omics Sequence Unification
Zicheng Liu, Siyuan Li, Zhiyuan Chen, Chang Yu, Qirong Yang, Yucheng Guo, Yujie Yang, Xiaoming Zhang, Stan Z. Li

TL;DR
Life-Code introduces a unified multi-omics modeling framework based on the central dogma, integrating DNA, RNA, and proteins through novel data pipelines and architectures to better understand genetic interactions.
Contribution
It presents a comprehensive framework that unifies multi-omics data and models the interactions between genetic molecules using innovative sequence transformation and hybrid architectures.
Findings
Achieves state-of-the-art results across multiple multi-omics tasks
Effectively models interactions between coding and non-coding regions
Demonstrates improved understanding of genetic sequence complexities
Abstract
The interactions between DNA, RNA, and proteins are fundamental to biological processes, as illustrated by the central dogma of molecular biology. Although modern biological pre-trained models have achieved great success in analyzing these macromolecules individually, their interconnected nature remains underexplored. This paper follows the guidance of the central dogma to redesign both the data and model pipeline and offers a comprehensive framework, Life-Code, that spans different biological functions. As for data flow, we propose a unified pipeline to integrate multi-omics data by reverse-transcribing RNA and reverse-translating amino acids into nucleotide-based sequences. As for the model, we design a codon tokenizer and a hybrid long-sequence architecture to encode the interactions between coding and non-coding regions through masked modeling pre-training. To model the translation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks
MethodsKnowledge Distillation
