Evaluating DNA function understanding in genomic language models using evolutionarily implausible sequences

Shiyu Jiang; Xuyin Liu; Zitong Jerry Wang

arXiv:2506.10271·q-bio.QM·August 27, 2025

Evaluating DNA function understanding in genomic language models using evolutionarily implausible sequences

Shiyu Jiang, Xuyin Liu, Zitong Jerry Wang

PDF

1 Repo

TL;DR

This paper evaluates genomic language models' ability to understand DNA function by testing their predictions on evolutionarily implausible sequences, revealing limitations in their mechanistic understanding of gene regulation.

Contribution

Introduces Nullsettes, a benchmark for assessing DNA models' ability to predict loss-of-function mutations in synthetic sequences, highlighting current models' reliance on evolutionary patterns.

Findings

01

Most models fail to detect LOF mutations in synthetic sequences.

02

Predictive accuracy drops as likelihood of original sequence decreases.

03

Models rely heavily on pattern-matching rather than mechanistic understanding.

Abstract

Genomic language models (gLMs) hold promise for generating novel, functional DNA sequences for synthetic biology. However, realizing this potential requires models to go beyond evolutionary plausibility and understand how DNA sequence encodes gene expression and regulation. We introduce a benchmark called Nullsettes, which assesses how well models can predict in silico loss-of-function (LOF) mutations, in synthetic expression cassettes with little evolutionary precedent. Testing 12 state-of-the-art gLMs, we find that most fail to consistently detect these strong LOF mutations. All models show a sharp drop in predictive accuracy as the likelihood assigned to the original (nonmutant) sequence decreases, suggesting that gLMs rely heavily on pattern-matching to their evolutionary prior rather than on any mechanistic understanding of gene expression. Our findings highlight fundamental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cellethology/glm-nullsette-benchmark
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training