Central Dogma Transformer III: Interpretable AI Across DNA, RNA, and Protein
Nobuyuki Ota

TL;DR
CDT-III is an interpretable AI model that predicts and explains DNA, RNA, and protein interactions, revealing fundamental biological insights and potential clinical applications.
Contribution
It extends mechanism-oriented AI across the full central dogma with a novel two-stage architecture, improving interpretability and predictive accuracy.
Findings
Achieves high correlation in gene expression predictions (RNA r=0.843, protein r=0.969)
Reveals opposite mRNA and protein response patterns in genes
Successfully predicts protein responses and clinical side effects in silico
Abstract
Biological AI models increasingly predict complex cellular responses, yet their learned representations remain disconnected from the molecular processes they aim to capture. We present CDT-III, which extends mechanism-oriented AI across the full central dogma: DNA, RNA, and protein. Its two-stage Virtual Cell Embedder architecture mirrors the spatial compartmentalization of the cell: VCE-N models transcription in the nucleus and VCE-C models translation in the cytosol. On five held-out genes, CDT-III achieves per-gene RNA r=0.843 and protein r=0.969. Adding protein prediction improves RNA performance (r=0.804 to 0.843), demonstrating that downstream tasks regularize upstream representations. Protein supervision sharpens DNA-level interpretability, increasing CTCF enrichment by 30%. Analysis of experimentally measured mRNA and protein responses reveals that the majority of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Cell Image Analysis Techniques · Machine Learning in Bioinformatics
