Beyond Sequence: Impact of Geometric Context for RNA Property Prediction
Junjie Xu, Artem Moskalev, Tommaso Mansi, Mangal Prakash, Rui Liao

TL;DR
This paper systematically evaluates how incorporating 2D and 3D geometric information improves RNA property prediction, demonstrating that geometry-aware models outperform sequence-only models especially in low-data and noisy scenarios.
Contribution
It introduces a curated RNA dataset with structural annotations and provides the first comprehensive evaluation of geometric encoding in RNA property prediction models.
Findings
Geometry-aware models reduce prediction RMSE by ~12%.
Explicit geometry encoding excels in low-data and partial labeling regimes.
Sequence-based models are more robust to sequencing noise.
Abstract
Accurate prediction of RNA properties, such as stability and interactions, is crucial for advancing our understanding of biological processes and developing RNA-based therapeutics. RNA structures can be represented as 1D sequences, 2D topological graphs, or 3D all-atom models, each offering different insights into its function. Existing works predominantly focus on 1D sequence-based models, which overlook the geometric context provided by 2D and 3D geometries. This study presents the first systematic evaluation of incorporating explicit 2D and 3D geometric information into RNA property prediction, considering not only performance but also real-world challenges such as limited data availability, partial labeling, sequencing noise, and computational efficiency. To this end, we introduce a newly curated set of RNA datasets with enhanced 2D and 3D structural annotations, providing a…
Peer Reviews
Decision·ICLR 2025 Poster
1. The authors investigated the enhancement of RNA property prediction through the utilization of both 2D and 3D data, and explored the performance degradation of corresponding models under various influencing factors, including noisy data and partial label. 2. The authors collected a substantial amount of RNA sequence data across a wide range of nucleotide count intervals, ensuring comprehensive coverage. 3. The authors conducted extensive experiments using various models on the dataset to su
1. The authors utilized a limited number of 3D models for geometric structure modeling, most of which are relatively early models, and neither of the two models (EGNN and SchNet) is specifically designed for 3D RNA structure modeling. Therefore, I believe their performance does not fully reflect the potential improvements offered by geometric information across various datasets. The authors are supposed to validate models specifically designed for RNA 3D structure modeling, such as ARES [1] and
1. This study provides a thorough comparison of 1D, 2D, and 3D models, showcasing their respective strengths and weaknesses in handling RNA data. 2. This study provides a comprehensive analysis of various deep learning models, assessing their performance under different conditions, including limited data and labels, different types of sequencing errors, and out-of-distribution scenarios, which is crucial for real-world applications. 3. The commitment to transparency and reproducibility by making
1. The article lacks methodological innovation, missing deep improvements on existing technologies and novel algorithm designs. 2. The article merely compares various metrics, and the key points are not sufficiently emphasized. 3. Both the secondary and tertiary structures are predicted using software, particularly the tertiary structure, which is not very accurate. This can lead to significant uncertainties in further property predictions.
For the field of AI for Science, high-quality datasets are very important assistants. This article integrates four datasets, which contain a large number of data samples of various types.
> **W1. Lack of explanation for RNA's uniqueness.** In section 3, it seems that a task of general sequence molecules is defined, and it is not specifically for RNA. Is there any fundamental difference in methodology between them and other sequenced molecules (such as proteins and DNA), except that the molecular composition may be slightly different? > **W2. The method used is relatively old.** There are many new works for 1D sequences and 2D topological graphs, which are not elaborated here.
Videos
Taxonomy
TopicsRNA and protein synthesis mechanisms · RNA modifications and cancer · RNA Research and Splicing
MethodsFocus · Sparse Evolutionary Training
