Rethinking Genomic Modeling Through Optical Character Recognition
Hongxin Xiang, Pengsen Ma, Yunkang Cao, Di Yu, Haowen Chen, Xinyu Yang, Xiangxiang Zeng

TL;DR
OpticalDNA introduces a vision-based OCR-style approach to genomic modeling, enabling more efficient and detailed understanding of DNA sequences by rendering them into visual layouts and training a specialized vision-language model.
Contribution
This work pioneers a visual OCR-inspired framework for genomic modeling, moving beyond traditional sequential token methods to improve efficiency and information retention.
Findings
Outperforms recent baselines across diverse genomic benchmarks.
Achieves nearly 20x fewer effective tokens on sequences up to 450k bases.
Uses only 256k trainable parameters to surpass models with up to 985x more parameters.
Abstract
Recent genomic foundation models largely adopt large language model architectures that treat DNA as a one-dimensional token sequence. However, exhaustive sequential reading is structurally misaligned with sparse and discontinuous genomic semantics, leading to wasted computation on low-information background and preventing understanding-driven compression for long contexts. Here, we present OpticalDNA, a vision-based framework that reframes genomic modeling as Optical Character Recognition (OCR)-style document understanding. OpticalDNA renders DNA into structured visual layouts and trains an OCR-capable vision--language model with a \emph{visual DNA encoder} and a \emph{document decoder}, where the encoder produces compact, reconstructible visual tokens for high-fidelity compression. Building on this representation, OpticalDNA defines prompt-conditioned objectives over core genomic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Fractal and DNA sequence analysis · Machine Learning in Bioinformatics
