Extending Sequence Length is Not All You Need: Effective Integration of Multimodal Signals for Gene Expression Prediction
Zhao Yang, Yi Duan, Jiwei Zhu, Ying Ba, Chuan Cao, Bing Su

TL;DR
This paper shows that integrating proximal multimodal epigenomic signals with a novel framework called Prism significantly improves gene expression prediction accuracy using short DNA sequences, challenging the focus on long sequence modeling.
Contribution
The paper introduces Prism, a new framework that effectively integrates diverse epigenomic signals and mitigates confounding effects, achieving state-of-the-art gene expression prediction with short sequences.
Findings
Long sequence modeling can decrease prediction performance.
Proper integration of multimodal signals improves accuracy.
Prism outperforms existing methods on gene expression prediction.
Abstract
Gene expression prediction, which predicts mRNA expression levels from DNA sequences, presents significant challenges. Previous works often focus on extending input sequence length to locate distal enhancers, which may influence target genes from hundreds of kilobases away. Our work first reveals that for current models, long sequence modeling can decrease performance. Even carefully designed algorithms only mitigate the performance degradation caused by long sequences. Instead, we find that proximal multimodal epigenomic signals near target genes prove more essential. Hence we focus on how to better integrate these signals, which has been overlooked. We find that different signal types serve distinct biological roles, with some directly marking active regulatory elements while others reflect background chromatin patterns that may introduce confounding effects. Simple concatenation may…
Peer Reviews
Decision·ICLR 2026 Oral
1. The introduction of confounder components for the gene expression prediction and their connection to biological intuition is important. As it completes the current casual relationship formulation of the epigenomic signal. 2. The observation regarding the sequence length required for CAGE prediction is interesting and biologically reasonable. The provided experiments support such observation on the K562 cell for Gene Expression CAGE Prediction. I still have some doubts about whether a shorte
The overall framework appears well designed and complete, and I have no further comments regarding potential improvements. My remaining concern lies in how to determine the appropriate sequence length for different prediction tasks. Furthermore, if the goal is to train a unified model for general gene expression prediction, it would be helpful to clarify how the model can adapt to varying sequence length requirements across different genes or datasets.
1. Clear motivation **P**. The paper picks upon a prevalent issue in long-context DNA sequence modelling. The authors narrow down on the key-issue and validate it experimentally. 2. Within gene expression prediction, using latent background-state weights + uniform backdoor averaging is relatively novel. 3. While most baselines are trained and reported at 200k bp, Prism runs at 2k bp and still beats prior SOTA (Table 1). This supports their claim that better multi-modal integration can offset lo
1. Prism completely discards long-range sequence information by design, operating on only 2kbp. This is presented as a strength, but I believe that this is also a fundamental limitation. The model cannot discover regulatory elements or sequence variations beyond its 2kbp window unless their effects are already captured by the provided proximal epigenomic signals. Have the authors explored how the metrics change when we increase the context? Why was 2k chosen? 2. Results in Table 1 are based on
The paper presents a clear and well-supported argument that genomic sequence models do not significantly benefit from longer input sequences, a strong claim that is convincingly demonstrated through extensive experimentation (specifically Table 12). The results showing the impact of epigenetic markers are compelling and supported by thorough ablation studies that highlight the individual contribution of each signal type. The analysis of the confounding effect is insightful, and the proposed solu
My concerns regarding the efficiency of the proposed approach. As shown in the hyperparameter sensitivity analysis (Section 4.3), the variation in performance when tuning the parameters \alpha and \beta appears minimal, suggesting limited sensitivity to these design choices. Similarly, the number of background states n has only a minor impact on results, as even the case n=0 in Table 2a performs comparably well. This raises questions about how essential the proposed causal intervention mechanism
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Chromatin Dynamics · Machine Learning in Bioinformatics · Gene expression and cancer classification
