Probing 3D Chromatin Structure Awareness in Evo2 DNA Language Model
UkJin Lee (Molecular Biology Program, Weill Cornell Graduate School of Medical Sciences, New York, NY, USA)

TL;DR
This study investigates whether the Evo2 DNA language model understands 3D chromatin structures, finding it recognizes local features but not higher-order 3D organization, suggesting future model improvements.
Contribution
It provides the first systematic evaluation of Evo2's ability to learn 3D chromatin features, revealing limitations in capturing higher-order genome organization.
Findings
Evo2 does not distinguish functional perturbations from controls.
Evo2 fails to reliably generate convergent CTCF loops.
Evo2 partially recovers TAD boundaries.
Abstract
DNA language models like Evo2 now fit million-token contexts large enough to cover entire TADs, yet whether they learn 3D chromatin structure, a key regulatory layer acting atop primary sequence, remains untested and questionable, given that Evo2's training data includes prokaryotes lacking this structure. We probed Evo2-7B on TAD boundaries and convergent CTCF loops in 1 Mb windows using two complementary tests: likelihood-based perturbation and sequence generation. Evo2 did not distinguish functional perturbations from matched random controls and failed to reliably generate convergent CTCF loops, recovering TAD boundaries only partially. Together, these results indicate that Evo2 has learned local CTCF grammar but misses higher-order 3D organization, pointing to bidirectional model architectures integrating cell types and 3D contacts, rather than longer contexts, as the path to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
