Central Dogma Transformer: Towards Mechanism-Oriented AI for Cellular Understanding
Nobuyuki Ota

TL;DR
The paper introduces the Central Dogma Transformer (CDT), a novel AI architecture that integrates DNA, RNA, and protein models to better understand cellular mechanisms with improved interpretability and predictive accuracy.
Contribution
It presents a mechanism-oriented transformer architecture that models the directional flow of biological information across molecular modalities, advancing cellular understanding.
Findings
Achieved a Pearson correlation of 0.503 on CRISPRi data, capturing 63% of the theoretical maximum.
Attention and gradient analyses reveal distinct genomic regions involved in regulation.
Demonstrated the model's potential for mechanistic interpretability in cellular processes.
Abstract
Understanding cellular mechanisms requires integrating information across DNA, RNA, and protein - the three molecular systems linked by the Central Dogma of molecular biology. While domain-specific foundation models have achieved success for each modality individually, they remain isolated, limiting our ability to model integrated cellular processes. Here we present the Central Dogma Transformer (CDT), an architecture that integrates pre-trained language models for DNA, RNA, and protein following the directional logic of the Central Dogma. CDT employs directional cross-attention mechanisms - DNA-to-RNA attention models transcriptional regulation, while RNA-to-Protein attention models translational relationships - producing a unified Virtual Cell Embedding that integrates all three modalities. We validate CDT v1 - a proof-of-concept implementation using fixed (non-cell-specific) RNA and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Bioinformatics and Genomic Networks · Cell Image Analysis Techniques
