Fair splits flip the leaderboard: CHANRG reveals limited generalization in RNA secondary-structure prediction
Zhiyuan Chen, Zhenfeng Deng, Pan Deng, Yue Liao, Xiu Su, Peng Ye, Xihui Liu

TL;DR
This paper introduces CHANRG, a comprehensive benchmark revealing that current RNA secondary-structure prediction models struggle to generalize across different RNA families, highlighting the need for more robust methods.
Contribution
The paper presents CHANRG, a large-scale, structurally non-redundant RNA benchmark with a new evaluation framework that exposes limitations in existing prediction models' generalization capabilities.
Findings
Foundation-model methods perform best in in-distribution accuracy.
All models lose significant accuracy out-of-distribution.
Structured decoders and neural predictors are more robust across datasets.
Abstract
Accurate prediction of RNA secondary structure underpins transcriptome annotation, mechanistic analysis of non-coding RNAs, and RNA therapeutic design. Recent gains from deep learning and RNA foundation models are difficult to interpret because current benchmarks may overestimate generalization across RNA families. We present the Comprehensive Hierarchical Annotation of Non-coding RNA Groups (CHANRG), a benchmark of 170{,}083 structurally non-redundant RNAs curated from more than 10 million sequences in Rfam~15.0 using structure-aware deduplication, genome-aware split design and multiscale structural evaluation. Across 29 predictors, foundation-model methods achieved the highest held-out accuracy but lost most of that advantage out of distribution, whereas structured decoders and direct neural predictors remained markedly more robust. This gap persisted after controlling for sequence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · RNA modifications and cancer · Machine Learning in Bioinformatics
