Central Dogma Transformer III: Interpretable AI Across DNA, RNA, and Protein

Nobuyuki Ota

arXiv:2603.23361·cs.LG·March 27, 2026

Central Dogma Transformer III: Interpretable AI Across DNA, RNA, and Protein

Nobuyuki Ota

PDF

Open Access

TL;DR

CDT-III is an interpretable AI model that predicts and explains DNA, RNA, and protein interactions, revealing fundamental biological insights and potential clinical applications.

Contribution

It extends mechanism-oriented AI across the full central dogma with a novel two-stage architecture, improving interpretability and predictive accuracy.

Findings

01

Achieves high correlation in gene expression predictions (RNA r=0.843, protein r=0.969)

02

Reveals opposite mRNA and protein response patterns in genes

03

Successfully predicts protein responses and clinical side effects in silico

Abstract

Biological AI models increasingly predict complex cellular responses, yet their learned representations remain disconnected from the molecular processes they aim to capture. We present CDT-III, which extends mechanism-oriented AI across the full central dogma: DNA, RNA, and protein. Its two-stage Virtual Cell Embedder architecture mirrors the spatial compartmentalization of the cell: VCE-N models transcription in the nucleus and VCE-C models translation in the cytosol. On five held-out genes, CDT-III achieves per-gene RNA r=0.843 and protein r=0.969. Adding protein prediction improves RNA performance (r=0.804 to 0.843), demonstrating that downstream tasks regularize upstream representations. Protein supervision sharpens DNA-level interpretability, increasing CTCF enrichment by 30%. Analysis of experimentally measured mRNA and protein responses reveals that the majority of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSingle-cell and spatial transcriptomics · Cell Image Analysis Techniques · Machine Learning in Bioinformatics