Beyond Conditional Computation: Retrieval-Augmented Genomic Foundation Models with Gengram
Huinan Xu, Xuyang Feng, Junhong Chen, Junchen Liu, Kaiwen Deng, Kai Ding, Shengning Long, Jiaxue Shuai, Zhaorong Li, Shiping Liu, Guirong Xue, Zhan Xiao

TL;DR
Gengram introduces a genomic-specific hashing memory module into foundation models, significantly improving performance and interpretability in functional genomics tasks by explicitly modeling biological motifs.
Contribution
The paper presents Gengram, a novel memory module with a hashing scheme that explicitly encodes genomic motifs, enhancing model performance and biological interpretability.
Findings
Up to 14% performance improvement across genomics tasks.
Gengram's latent space captures meaningful biological representations.
The module generalizes robustly across different model architectures.
Abstract
Current genomic foundation models (GFMs) rely on extensive neural computation to implicitly approximate conserved biological motifs from single-nucleotide inputs. We propose Gengram, a conditional memory module that introduces an explicit and highly efficient lookup primitive for multi-base motifs via a genomic-specific hashing scheme, establishing genomic "syntax". Integrated into the backbone of state-of-the-art GFMs, Gengram achieves substantial gains (up to 14%) across several functional genomics tasks. The module demonstrates robust architectural generalization, while further inspection of Gengram's latent space reveals the emergence of meaningful representations that align closely with fundamental biological knowledge. By establishing structured motif memory as a modeling primitive, Gengram simultaneously boosts empirical performance and mechanistic interpretability, providing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Chromatin Dynamics · Evolutionary Algorithms and Applications · Gene expression and cancer classification
