TrinityDNA: A Bio-Inspired Foundational Model for Efficient Long-Sequence DNA Modeling
Qirong Yang, Yucheng Guo, Zicheng Liu, Yujie Yang, Qijin Yin, Siyuan Li, Shaomin Ji, Linlin Chao, Xiaoming Zhang, and Stan Z. Li

TL;DR
TrinityDNA is a biologically inspired foundational model that effectively captures long-range dependencies in DNA sequences, improving genomic analysis tasks through innovative structural components and training strategies.
Contribution
It introduces TrinityDNA, a novel DNA model with biologically informed modules and a multi-scale attention mechanism, advancing long-sequence DNA modeling capabilities.
Findings
Enhanced gene function prediction accuracy
Improved regulatory mechanism discovery
Established a new DNA long-sequence CDS annotation benchmark
Abstract
The modeling of genomic sequences presents unique challenges due to their length and structural complexity. Traditional sequence models struggle to capture long-range dependencies and biological features inherent in DNA. In this work, we propose TrinityDNA, a novel DNA foundational model designed to address these challenges. The model integrates biologically informed components, including Groove Fusion for capturing DNA's structural features and Gated Reverse Complement (GRC) to handle the inherent symmetry of DNA sequences. Additionally, we introduce a multi-scale attention mechanism that allows the model to attend to varying levels of sequence dependencies, and an evolutionary training strategy that progressively adapts the model to both prokaryotic and eukaryotic genomes. TrinityDNA provides a more accurate and efficient approach to genomic sequence modeling, offering significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenomics and Chromatin Dynamics · Machine Learning in Bioinformatics · Fractal and DNA sequence analysis
