FRET-guided selection of RNA 3D structures

Mirko Weber; Felix Erichson; Maciej Antczak; Vanessa Schumann; Josephine Meitzner; Tomasz Zok; Fabio D Steffen; Marta Szachniuk; Richard Börner

PMC · DOI:10.1093/nar/gkag147·February 25, 2026

FRET-guided selection of RNA 3D structures

Mirko Weber, Felix Erichson, Maciej Antczak, Vanessa Schumann, Josephine Meitzner, Tomasz Zok, Fabio D Steffen, Marta Szachniuk, Richard Börner

PDF

Open Access

TL;DR

This paper introduces a method using FRET data to guide the selection of accurate RNA 3D structures from computational models.

Contribution

A novel FRET-guided workflow is proposed to identify RNA conformations consistent with experimental data.

Findings

01

FRET distributions predicted from RNA models matched experimental smFRET data.

02

The workflow successfully identified RNA conformational states compatible with observed FRET states.

03

The method improves the accuracy of RNA 3D modeling by integrating experimental FRET data.

Abstract

Integrative biomolecular modeling of RNA relies on refined structural collections and accurate experimental data that reflect binding and folding behavior. However, predicting such collections remains challenging due to the rugged energy landscape and extensive conformational heterogeneity of large RNAs. To overcome these limitations, we applied a Förster resonance energy transfer (FRET)-guided strategy to identify RNA conformational states consistent with single-molecule FRET (smFRET) experiments. We predicted 3D structures of a ribosomal RNA tertiary contact comprising a GAAA tetraloop and a kissing loop using three popular RNA 3D modeling tools, namely RNAComposer, FARFAR2, and AlphaFold3, yielding a collection of candidate conformations. These models were structurally validated based on Watson–Crick base-pairing patterns and filtered using an eRMSD threshold. For each retained…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes2

KL IGKV3-20

Proteins2

Chemicals6

Cy3

Cy5

FluoTime Poly(A)ethylenediaminetetraacetic acid ACV

Figures4

Click any figure to enlarge with its caption.

Workflow for selecting smFRET-consistent RNA 3D structure collections. The pipeline starts from a sequence and yields a final collection of 3D structures that best represent the experimental smFRET data under the condition of a correctly folded KL motif. (A) RNA sequence of the investigated model construct, highlighting all structural elements. (B) Secondary structure representation of the construct. (C) Initial structure collection of 1000 RNA 3D models predicted by RNAComposer. (D) Evaluation of predicted structures using Barnaba to ensure correctly folded KLs via WC base pairing and the calculation of the eRMSD relative to the reference structure. (E) For each filtered structure, the mACVs of the dyes Cy3 and Cy5 were predicted, and the mean FRET efficiency was calculated. (F) Distribution of the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $E_{\mathrm{DA}}$\end{document} values calculated from the mACVs, shown alongside the experimental smFRET distribution with corresponding bin populations. (G) Reweighting of the structure collection by sampling structures from each populated bin according to the probabilities observed in the experimental smFRET distribution. (H) Comparison of FRET efficiencies after photon sampling with FRETraj for the unweighted and weighted \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $E_{\mathrm{DA}}$\end{document} distributions.

Distribution of energy-transfer efficiencies \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $E_{\mathrm{DA}}$\end{document} calculated from ACV-derived mean donor–acceptor distances. Distributions are shown for RNAComposer, FARFAR2, and AlphaFold3, and compared to the corresponding reference distributions obtained from the full structure collections (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $N = 10\,000$\end{document} structures). (A) KLD of randomly sampled subsets ranging from 10 to the corresponding full set of 10 000 structures for each predictive tool. Dashed vertical lines indicate the subset sizes \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $i = 100, 500, 1000, \textrm {and} \,10\,000$\end{document} used to generate the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $E_{\mathrm{DA}}$\end{document} distributions shown in (B).

Reference KL fold and its comparison with the RNAComposer structure collection. (A) (left) Cryo-EM reference structure of the KL, comprising WC base pairs, except A27–U59. (right) Preservation of the KL motif across six \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $1 \,{\mu }\mathrm{s}$\end{document} MD simulation sets relative to the cryo-EM reference structure comprising WC base pairs for the complete KL. (B) eRMSD of all RNAComposer structures (left) and RNAComposer structure collection after \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $1 \,\mathrm{n}\mathrm{s}$\end{document} MD refinement (right) compared to the reference cryo-EM structure, focusing on the KL region. (C) Representative structures classified as preserved or altered KLs, showing those with the lowest and highest eRMSD, both before and after MD refinement.

Comparison of 3D structure prediction tools and MD simulations with the smFRET experiment. (A) Energy-transfer efficiency \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $E_{\mathrm{DA}}$\end{document} distributions for all approaches without photon sampling, and the final 3D structure collections after filtering presented with and without ACVs. (B) FRET distributions with photon statistics simulated by FRETraj. “Unweighted” refers to a uniform contribution of each structure to the burst calculation, whereas “weighted” denotes sampling according to experimental smFRET probabilities. (C) The radar charts show the contribution of individual FRET-selected structures to the weighted collections for each tool. The FARFAR2 collection includes many diverse structures differing in energy-transfer efficiency, while RNAComposer and AlphaFold3 provide the most similar structures, comprising a narrow energy-transfer efficiency distribution. The FRET-guided selection leads to a fewer number of individual structures of the FARFAR2 collection, which are highly populated, while the number of structures of RNAComposer and AlphaFold3 is comparably high but uniformly distributed in the weighted FRET distribution. The population of structures from MD simulations is comparable to the FARFAR2 collection, but comprises more structures to be selected throughout the weighting process with a corresponding lower percentage in the weighted FRET distribution. (D) Number of structures lost during KL validation and the final number selected for weighting. KL filtering was not applied to MD simulation structures.

Tables1

Table 1.. FRET measurements for all approaches

	w/o photon sampling	w/ photon sampling
		Unweighted	Weighted
RNAComposer	0.26 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\pm$\end{document} 0.08	0.23 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\pm$\end{document} 0.05	0.29 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\pm$\end{document} 0.06
FARFAR2	0.69 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\pm$\end{document} 0.19	0.77 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\pm$\end{document} 0.07	0.32 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\pm$\end{document} 0.06
AlphaFold3	0.27 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\pm$\end{document} 0.08	0.24 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\pm$\end{document} 0.05	0.30 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\pm$\end{document} 0.06
MD simulation	0.60 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\pm$\end{document} 0.24	0.65 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\pm$\end{document} 0.08	0.31 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\pm$\end{document} 0.06
smFRET	–	0.32 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $\pm$\end{document} 0.14

Equations4

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} D_{\mathrm{KLD}}(P \Vert Q_{x_i}^{(r)}) = \sum _{j=1}^m p_j \log \left(\frac{p_j}{q_j^{(r)}}\right), \end{eqnarray*}\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} \overline{D_{\mathrm{KLD}}}(x_i) = \frac{1}{k_i} \sum _{r=1}^{k_i} D_{\mathrm{KLD}}(P \Vert Q_{x_i}^{(r)}), \end{eqnarray*}\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} P_{\mathrm{unweighted}}(i) = \frac{|B_i|}{N}, \end{eqnarray*}\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} P_{\mathrm{weighted}}(i) = \left\lbrace \begin{array}{@{}l@{\quad }l@{}}p_i^{\mathrm{smFRET}} & \text{if } |B_i| > 0, \\0 & \mathrm{otherwise,} \end{array}\right. \end{eqnarray*}\end{document}

Funding8

—European Social Fund Plus10.13039/501100004895
—German Research Foundation10.13039/501100001659
—Mittweida University of Applied Sciences
—Laserinstitut Hochschule Mittweida
—Poznan University of Technology10.13039/501100004239
—Polish Academy of Sciences10.13039/501100004382
—National Science Centre, Poland10.13039/501100004442
—German Research Foundation10.13039/501100001659

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRNA and protein synthesis mechanisms · Protein Structure and Dynamics · DNA and Nucleic Acid Chemistry

Full text

Introduction

Three-dimensional (3D) structure prediction and molecular dynamics (MD) simulations have been instrumental in resolving biomolecular structures at atomic resolution [1]. While these methods have been particularly impactful in protein science [2–4], increasing attention is now being directed toward RNA and RNA–protein complexes due to their structural complexity and functional diversity [5, 6]. In response, RNA 3D structure prediction has become a rapidly evolving field, with sustained efforts to enhance accuracy, reliability, and robustness. A key development in this area was the launch of RNA-Puzzles in 2011 [7], a community-wide initiative for blind benchmarking of RNA 3D prediction tools against experimentally determined reference structures [8–11]. This effort has significantly accelerated the field by enabling objective performance evaluation and encouraging methodological advancements [12]. Despite a growing range of computational strategies, modeling RNA structures remains dependent on experimental data from high-resolution techniques such as X-ray crystallography [13], nuclear magnetic resonance spectroscopy [14], and cryo-electron microscopy (cryo-EM) [15]. As the demand for accurate modeling of flexible and functionally relevant RNA conformations continues to grow, the integration of predictive modeling with experimental validation remains a cornerstone of progress in RNA structural biology.

Although in silico prediction and MD simulations provide critical insights into the 3D organization of RNA, they often focus on identifying a single, energetically favorable state or sampling a limited subspace of the whole conformational landscape. This single-state representation is inadequate for capturing the structural heterogeneity observed in many functional RNAs, where multiple transiently populated conformations coexist [16]. To address these limitations, in-solution techniques such as small-angle X-ray scattering [17, 18], electron paramagnetic resonance (EPR) [19, 20], and Förster resonance energy transfer (FRET) [21, 22] serve as complementary approaches for probing conformationally heterogeneous RNA structure collections. These methods enable structural characterization and are particularly well suited for detecting dynamic behavior [23, 24]. EPR and single-molecule FRET (smFRET), in particular, provide access to highly precise, site-specific distance information that is critical for resolving low-populated or transient states. Within integrative or hybrid modeling approaches [25, 26], such precise distance constraints help to select and validate conformations from predicted structure collections. Among available experimental techniques, smFRET combines nanometer-scale resolution with single-molecule sensitivity, enabling direct comparison between predicted and experimental distances or entire distance distributions [27]. This makes smFRET particularly well suited for detecting conformational subpopulations underlying RNA dynamics.

RNA folding is a hierarchical, dynamic process in which proteins can assist folding by stabilizing native-like intermediates, and tertiary interactions are essential for establishing compact RNA architectures [28]. In particular, tertiary contacts between distant secondary structure elements may form transiently or only in the presence of stabilizing factors such as Mg(II) ions or RNA-binding proteins [29–32]. As a consequence, RNAs populate heterogeneous ensembles of interconverting unbound conformations rather than a single well-defined structure. Experimental characterization of such conformational collections remains challenging due to their high flexibility and structural diversity. In this context, single-molecule techniques such as smFRET identify conformations that are consistent with distance constraints measured in solution [22, 27, 33, 34].

While FRET-assisted and integrative hybrid modeling concepts are well established, the present study develops a predictor-agnostic workflow for post-hoc structure collection selection from de novo RNA 3D structure prediction outputs. Rather than fitting restraints within a single modeling engine, we combine (i) motif- and geometry-based plausibility filtering with (ii) forward modeling of smFRET efficiency distributions using explicit dye-accessibility modeling and experimentally refined photophysical parameters. This approach enables direct comparison of how different prediction algorithms populate structural diversity and how this diversity affects structure selection, consistent with FRET. Importantly, our goal is to describe solution-state heterogeneity, particularly for unbound and flexible states, rather than to infer a unique atomic structure from a single FRET observable.

Here, we apply FRET-guided integrative modeling to a ribosomal RNA model construct, featuring a kissing loop (KL) and a highly flexible GAAA tetraloop (TL_GAAA_) domain, which can serve as a tertiary contact binding to the KL [32], that we aim to characterize structurally in its unbound state. We chose this construct because it combines a structurally anchored KL motif, for which high-resolution ribosomal cryo-EM structures are available, and an unstructured region, where conformational heterogeneity is expected. This makes it a suitable test case to evaluate whether predictor-derived candidate pools can be rationally filtered and reduced to smFRET-compatible structure collections under well-defined ionic conditions. We chose three prediction algorithms, representative of distinct modeling philosophies: RNAComposer [35, 36], an efficient fragment/template-based approach; AlphaFold3 [2], an emerging deep learning-based predictor; and FARFAR2 [37], an explicit conformational sampling method. The observed differences in ensemble breadth, therefore, primarily reflect design goals rather than a nongeneralizable superiority ranking: FARFAR2, as a sampling-oriented method, employs a stepwise Monte Carlo sampling procedure guided by a scoring function, which inherently generates structurally diverse collections, albeit at the expense of high computational cost, whereas RNAComposer and AlphaFold3, as efficiency-oriented predictors, are designed for high efficiency and do not explicitly perform conformational sampling; instead, they typically return a small set of high-confidence/low-energy solutions rather than an explicitly sampled ensemble [11, 27]. For model validation, we calculated the multiple accessible contact volume (mACV) [27, 38, 39] to predict in silico FRET from the dye-labeled RNA models and used these to identify an initial pool of candidate structures matching the experimental smFRET distribution. To ensure structural plausibility, we applied established validation metrics, including Watson–Crick (WC) base-pairing [29] analysis and the eRMSD score [40, 41], yielding a refined collection suitable for smFRET-guided conformer selection.

Ultimately, we used the experimentally measured smFRET distribution to guide structure selection from the validated collections. Instead of comparing FRET values post hoc, we applied the distribution as a filter to extract conformers whose ACV-based predictions matched the experimental data. By enabling targeted sampling within the conformational landscape resolved by smFRET, this approach advances RNA structural modeling beyond static representations toward a dynamic, experimentally grounded collection.

Methods

Probing the unbound state with smFRET

smFRET measurements of the Cy3/Cy5-labeled ribosomal RNA model construct, designed to probe KL formation and potential TL_GAAA_ binding, were performed and analyzed according to standard protocols [23, 27] in a home-built fluorescence microscope. The construct was prepared as described in [32] and visualized in Fig. 1A and B. To allow for molecular sorting, pulsed overlaid excitation (POE) smFRET experiments were performed in standard buffer containing [K $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ ^{+} $\end{document}$ ] = $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 116 ,\mathrm{mmol},\mathrm{l}^{-1} $\end{document}$ and ethylenediaminetetraacetic acid to allow KL formation without TL_GAAA_ binding (Supplementary Figs S1 and S2). The smFRET data were corrected for background and crosstalk, including bleed-through and direct excitation. A $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \gamma $\end{document}$ -correction was applied to account for differences in detection efficiency and quantum yield of the FRET dye pair (Supplementary Table S1). These corrections are essential for the comparability of the experimental data with FRET distributions derived from in silico structure collections using FRETraj [39], together with the experimental burst size distribution (Supplementary Fig. S7B), i.e. the sum of donor and acceptor intensities (Fig. 1F). Additionally, the fluorescence lifetime of the Cy3/Cy5-labeled ribosomal RNA model construct was measured by TCSPC using a FluoTime 250 and time-resolved fluorescence anisotropy decays were calculated to investigate dye–RNA stacking (Supplementary Figs S3 and S4, and Supplementary Tables S2 and S3) to calculate the CV fraction according to [27, 38].

Workflow for selecting smFRET-consistent RNA 3D structure collections. The pipeline starts from a sequence and yields a final collection of 3D structures that best represent the experimental smFRET data under the condition of a correctly folded KL motif. (A) RNA sequence of the investigated model construct, highlighting all structural elements. (B) Secondary structure representation of the construct. (C) Initial structure collection of 1000 RNA 3D models predicted by RNAComposer. (D) Evaluation of predicted structures using Barnaba to ensure correctly folded KLs via WC base pairing and the calculation of the eRMSD relative to the reference structure. (E) For each filtered structure, the mACVs of the dyes Cy3 and Cy5 were predicted, and the mean FRET efficiency was calculated. (F) Distribution of the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $E_{\mathrm{DA}}$ \end{document} values calculated from the mACVs, shown alongside the experimental smFRET distribution with corresponding bin populations. (G) Reweighting of the structure collection by sampling structures from each populated bin according to the probabilities observed in the experimental smFRET distribution. (H) Comparison of FRET efficiencies after photon sampling with FRETraj for the unweighted and weighted \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $E_{\mathrm{DA}}$ \end{document} distributions.

In silico RNA 3D structure prediction and MD simulation

Structure collections consisting of 10 000 structures each were predicted using RNAComposer, FARFAR2, and AlphaFold3 (Supplementary Methods). Additionally, we performed six independent $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 1 ,{\mu }\mathrm{s} $\end{document}$ MD simulations with different initial seed structures chosen to represent structurally diverse conformations (Supplementary Methods). An equal number of structures was sampled from each of the six MD simulations for comparison with the 3D prediction tools and for subsequent use in the FRET-guided reweighting. ACVs of both dyes were calculated for each structure predicted by the 3D prediction tools, as well as every $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 100 ,\mathrm{p}\mathrm{s} $\end{document}$ along the MD trajectories. Photon emission events were simulated by FRETraj [39, 42, 43]. The FRET distributions were also analyzed both individually for each MD simulation (Supplementary Figs S6 and S7) and as a combined collection (Fig. 4A and B—MD simulation). All MD simulations were annotated every $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 100 ,\mathrm{p}\mathrm{s} $\end{document}$ based on WC base pairing using the Barnaba toolbox [41], resulting in six sets of 10 000 annotated structures representing the unbound RNA conformation (averaged in Fig. 3A right).

Estimating structure collection size using Kullback–Leibler divergence

The foundation of our approach to select diverse structures that match the experimental distribution of energy-transfer efficiencies is the diversity of the underlying generated 3D structure collection. Our underlying assumption is that $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ N = 10,000 $\end{document}$ predicted structures are sufficient to adequately cover the conformational search space sampled by the prediction tools considered. To assess the minimum number of structures required for a representative subset, and thus to limit the computational costs to a minimum, we compared the probability distribution $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ P $\end{document}$ of the energy-transfer efficiencies $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ E_{\mathrm{DA}} $\end{document}$ calculated from the initial structure collection of size $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ N $\end{document}$ and sub-distributions sampled $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ Q_{x_i} $\end{document}$ from each prediction tool. This comparison allowed us to assess how well smaller subsets capture the variability of the energy-transfer efficiency $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ E_{\mathrm{DA}} $\end{document}$ observed in the collection of structures in the particular distribution. For each $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ i $\end{document}$ , where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ i \in \lbrace 10, 20, 30, \dots , 9990\rbrace $\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ x_i $\end{document}$ represents the number of samples drawn from the structure collection of size $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ N $\end{document}$ and for each $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ x_i $\end{document}$ the structures were randomly sampled from $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ N $\end{document}$ . To compare the probability distribution, we applied the Kullback–Leibler divergence (KLD) [44], which is computed for each repetition as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} D_{\mathrm{KLD}}(P \Vert Q_{x_i}^{(r)}) = \sum _{j=1}^m p_j \log \left(\frac{p_j}{q_j^{(r)}}\right), \end{eqnarray*}\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ m $\end{document}$ is the total number of bins, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ p_j $\end{document}$ is the true probability of bin $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ j $\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ q_j^{(r)} $\end{document}$ is the corresponding probability of the bin from the sampled distribution. The mean KLD across all repetitions for a given $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ x_i $\end{document}$ is then defined as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} \overline{D_{\mathrm{KLD}}}(x_i) = \frac{1}{k_i} \sum _{r=1}^{k_i} D_{\mathrm{KLD}}(P \Vert Q_{x_i}^{(r)}), \end{eqnarray*}\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ k_i $\end{document}$ was increased until the standard deviation of the corresponding KLD values fell below 1% of their mean. Using this criterion, we determined the minimum number of structures required for each structure collection to achieve an accurate representation (Fig. 2 and Supplementary Fig. S5 for different bin sizes $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \Delta E_{\mathrm{DA}} $\end{document}$ ).

Distribution of energy-transfer efficiencies \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $E_{\mathrm{DA}}$ \end{document} calculated from ACV-derived mean donor–acceptor distances. Distributions are shown for RNAComposer, FARFAR2, and AlphaFold3, and compared to the corresponding reference distributions obtained from the full structure collections (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $N = 10\,000$ \end{document} structures). (A) KLD of randomly sampled subsets ranging from 10 to the corresponding full set of 10 000 structures for each predictive tool. Dashed vertical lines indicate the subset sizes \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $i = 100, 500, 1000, \textrm {and} \,10\,000$ \end{document} used to generate the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $E_{\mathrm{DA}}$ \end{document} distributions shown in (B).

Validating KL formation and filtering structure collections

To ensure formation of the KL as observed in the reference structure (PDB ID: 3JCT and Fig. 3A), we compared the WC base pairs annotated by Barnaba [41] across all predicted structures with those from 10 000 MD-derived states. This analysis validated the structural preservation of the KL motif. WC base pairs were then annotated for all structures from RNAComposer (Fig. 3A and B), FARFAR2 (Supplementary Fig. S8), and AlphaFold3 (Supplementary Fig. S9). Structures were classified as preserved KL if all relevant base pairs were canonical, except for U27–A59, which was allowed to vary in agreement with the reference. Any deviation was annotated as altered KL (Fig. 3B and C for RNAComposer and Supplementary Figs S8 and S9 for FARFAR2 and AlphaFold3).

Reference KL fold and its comparison with the RNAComposer structure collection. (A) (left) Cryo-EM reference structure of the KL, comprising WC base pairs, except A27–U59. (right) Preservation of the KL motif across six \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $1 \,{\mu }\mathrm{s}$ \end{document} MD simulation sets relative to the cryo-EM reference structure comprising WC base pairs for the complete KL. (B) eRMSD of all RNAComposer structures (left) and RNAComposer structure collection after \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $1 \,\mathrm{n}\mathrm{s}$ \end{document} MD refinement (right) compared to the reference cryo-EM structure, focusing on the KL region. (C) Representative structures classified as preserved or altered KLs, showing those with the lowest and highest eRMSD, both before and after MD refinement.

Structural similarity of each model to the reference was quantified using eRMSD [40]. Models with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \mathrm{eRMSD} \le 0.8 $\end{document}$ and preserved base pairing were selected for further analysis (Fig. 3B left for RNAComposer). To assess structural stability and confirm base-pair preservation after refinement, each filtered model underwent a short $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 1 ,\mathrm{n}\mathrm{s} $\end{document}$ MD simulation, followed by re-annotation and recomputation of WC base pairs and eRMSD (Fig. 3B right for RNAComposer, and Supplementary Figs S8 and S9 for FARFAR2 and AlphaFold3, respectively).

Reweighting FRET efficiencies from mACVs

To compare the structure collections obtained from the investigated 3D prediction tools and the described MD simulations with the experimental FRET distribution, we applied FRETraj within the FRET-assisted modeling pipeline [45] to generate a distribution of energy-transfer efficiencies $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ E_{\mathrm{DA}} $\end{document}$ based on the ACV dye model and using two different structure sampling strategies (compare Fig. 1E–H) [39]. In the first approach, each structure was selected exactly once, thereby preserving the original distribution of predicted energy-transfer efficiencies $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ E_{\mathrm{DA}} $\end{document}$ . Consequently, the relative frequency of structures in each bin $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ B_i $\end{document}$ reflected the number of structures falling within the corresponding $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ E_{\mathrm{DA}} $\end{document}$ range, i.e.

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} P_{\mathrm{unweighted}}(i) = \frac{|B_i|}{N}, \end{eqnarray*}\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ |B_i| $\end{document}$ denoted the number of structures in bin $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ i $\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ N $\end{document}$ was the total number of structures in the filtered collection (Fig. 1F and H—unweighted mACV).

In the second approach, structures were sampled such that the resulting distribution approximated the experimental smFRET distribution $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ p^{\mathrm{smFRET}} $\end{document}$ (Fig. 1F–H—weighted mACV):

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} P_{\mathrm{weighted}}(i) = \left\lbrace \begin{array}{@{}l@{\quad }l@{}}p_i^{\mathrm{smFRET}} & \text{if } |B_i| > 0, \\0 & \mathrm{otherwise,} \end{array}\right. \end{eqnarray*}\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ p_i^{\mathrm{smFRET}} $\end{document}$ denoted the smFRET probability of bin $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ i $\end{document}$ , provided that $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ B_i $\end{document}$ was not empty.

Based on this, two scenarios were distinguished for each bin:

Scenario 1:

If the relative number of structures in $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ B_i $\end{document}$ exceeded the experimental probability $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ p_i^{\mathrm{smFRET}} $\end{document}$ , a subset of structures was randomly sampled from $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ B_i $\end{document}$ to match $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ p_i^{\mathrm{smFRET}} $\end{document}$ .

Scenario 2:

If the number of structures in $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ B_i $\end{document}$ was insufficient to satisfy $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ p_i^{\mathrm{smFRET}} $\end{document}$ , all structures in $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ B_i $\end{document}$ were selected repeatedly until the required number was reached. Any remaining fraction was filled by randomly sampling additional structures from $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ B_i $\end{document}$ . Each repetition was randomly permuted to construct an expanded set $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \bar{B}_i $\end{document}$ , from which sampling was performed to match $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ p_i^{\mathrm{smFRET}} $\end{document}$ .

Results

One thousand RNA 3D structures of the KL-TLGAAA are sufficient to capture experimental FRET distributions

In silico structure prediction methods generally struggle to identify how many distinct structures are required to represent a structurally dynamic state. Thus, a collection of structures in one conformation is sampled from a diverse set of static structures.

In our work, we compared three widely used tools, RNAComposer, FARFAR2, and AlphaFold3, and evaluated their ability to predict the unbound state of the KL-TL_GAAA_ model construct. Each tool generated 10 000 structures, serving as a baseline for assessing the structural diversity of the unbound state (Fig. 2). Subsets of varying sizes were then sampled from each baseline dataset and compared to the full $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ E_{\mathrm{DA}} $\end{document}$ distribution using the KLD.

In this context, the bin size of the FRET distribution was crucial: the more fine-grained we represented the FRET distribution, the more structures were required to accurately represent the baseline of the full structure collection of size $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ N $\end{document}$ (Supplementary Fig. S5). A bin size of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \Delta E_{\mathrm{DA}} = 0.05 $\end{document}$ was chosen based on a benchmark study demonstrating an experimental accuracy of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \Delta $\end{document}$ E = $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \pm $\end{document}$ 0.05 [23], and this criterion is applied throughout the work. In fact, the KLD converged toward zero across all tested $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ E_{\mathrm{DA}} $\end{document}$ bin sizes (Supplementary Fig. S5). The minimum feasible subset is determined by the diversity of the original $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ E_{\mathrm{DA}} $\end{document}$ distribution. As a result, FARFAR2 requires more structures than RNAComposer and AlphaFold3 to reproduce the baseline (Fig. 2A). RNAComposer and AlphaFold3 begin to converge at a subset size of around 100 structures, while FARFAR2 requires a slightly larger subset due to a wider range of predicted $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ E_{\mathrm{DA}} $\end{document}$ values (Fig. 2). To ensure a sufficiently large set of structures for subsequent validation while retaining enough conformers to represent the FRET distribution, we chose a structure collection size of 1000 for each tool. This number is sufficient to model FRET in the unbound (low-FRET) state of the investigated construct, although it may need adjustment for RNAs with greater structural complexity.

Accurate structure annotation is critical for selecting reliable KLs

Since proper folding of the tertiary contact in our construct depends on an accurately positioned KL, we specifically analyzed our structure collections to ensure KL formation matched the cryo-EM reference (PDB ID: 3JCT). Folding was first assessed via WC base-pairing patterns observed in the reference structure [46]. Our analysis revealed that the hydrogen-bonding distance between A27 and U59 in the reference exceeded the favorable range for stable WC base pairing and therefore was not classified as canonical. To investigate the stability and variability of the motif, we used all six $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 1 ,{\mu }\mathrm{s} $\end{document}$ MD simulations of our model construct (Fig. 3A). The MD simulation seed structures are made up from the cryo-EM reference structure, thus starting with A27–U59 without canonical WC geometry. These simulations show that all base pairings, including A27–U59, adopt canonical WC geometry, indicating improved consistency in the MD collection. For subsequent classifications of the KL integrity, we required canonical WC base pairing for all bases except A27–U59, for which spatially proximal representations not necessarily classified under a specific WC base-pairing category were allowed.

To complement WC base-pairing analysis and refine structural characterization of the KL, we computed the eRMSD for all predicted structures relative to the cryo-EM reference (PDB ID: 3JCT). The eRMSD captures geometric deviations in base pairs and stacking interactions, providing a robust measure of RNA motif similarity. Across prediction tools, many structures clustered around an eRMSD $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \approx $\end{document}$ 0.5 (Fig. 3B left and Supplementary Figs S8 and S9 left). RNAComposer and FARFAR2 showed gradual transitions to higher eRMSD, whereas AlphaFold3 exhibited a distinct cutoff at $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \mathrm{eRMSD} = 0.8 $\end{document}$ , and we apply this value for the assessment of all predictive tools. To assess structural plasticity, each structure was subjected to a short $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 1 ,\mathrm{n}\mathrm{s} $\end{document}$ MD simulation, followed by re-annotation of base pairings and re-calculation of eRMSD (Fig. 3B right and Supplementary Figs S8 and S9 right). These simulations introduced local fluctuations in the KL, shifting some conformers above $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \mathrm{eRMSD}~=~0.8 $\end{document}$ and increasing the number of non-WC base pairs, which were dismissed for further FRET predictions.

Neither WC base pairing nor eRMSD alone suffices to verify KL preservation in our construct. Using only $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \mathrm{eRMSD} \le 0.8 $\end{document}$ can retain structures that deviate from the canonical WC pattern and thus do not represent a preserved KL. Conversely, relying solely on WC base pairing can include structures with significantly altered backbone geometries. Applying both criteria together consistently selects folded KL structures with WC base pairing across all three structure prediction tools, further used for FRET predictions (Fig. 3C and Supplementary Figs S8 and S9).

The experimental smFRET distribution guides the selection of structures from the predicted collections

Building on the filtered structure collections, we can probe their conformational landscape and guide further refinement using the experimental smFRET distribution. The initial energy-transfer efficiency distributions $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ E_{\mathrm{DA}} $\end{document}$ derived from ACVs without photon sampling of all filtered structures reveal distinct conformational profiles across the different prediction strategies (Fig. 4A). FARFAR2 and the combined six MD simulations sample a broad range of inter-dye distances, covering not only the experimentally observed FRET distribution of the model construct, but higher energy-transfer efficiencies yielding mean energy-transfer efficiencies of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 0.69 \pm 0.19 $\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 0.60 \pm 0.24 $\end{document}$ , respectively (Table 1—w/o photon sampling). This is also evident in the structural visualizations, which show a nearly globular distribution of different TL_GAAA_ positions. In contrast, RNAComposer and AlphaFold3 predominantly yield conformations in the low-FRET region, with mean energy-transfer efficiencies of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 0.26 \pm 0.08 $\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 0.27 \pm 0.08 $\end{document}$ , respectively, suggesting structurally constrained collections that arise from a stretched poly(A)-linker conformation limiting the accessible space of the TL_GAAA_ (Fig. 4A and Table 1—w/o photon sampling). Interestingly, these initial distributions already indicate that none of the tools fully cover the experimentally observed FRET distribution.

Comparison of 3D structure prediction tools and MD simulations with the smFRET experiment. (A) Energy-transfer efficiency \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $E_{\mathrm{DA}}$ \end{document} distributions for all approaches without photon sampling, and the final 3D structure collections after filtering presented with and without ACVs. (B) FRET distributions with photon statistics simulated by FRETraj. “Unweighted” refers to a uniform contribution of each structure to the burst calculation, whereas “weighted” denotes sampling according to experimental smFRET probabilities. (C) The radar charts show the contribution of individual FRET-selected structures to the weighted collections for each tool. The FARFAR2 collection includes many diverse structures differing in energy-transfer efficiency, while RNAComposer and AlphaFold3 provide the most similar structures, comprising a narrow energy-transfer efficiency distribution. The FRET-guided selection leads to a fewer number of individual structures of the FARFAR2 collection, which are highly populated, while the number of structures of RNAComposer and AlphaFold3 is comparably high but uniformly distributed in the weighted FRET distribution. The population of structures from MD simulations is comparable to the FARFAR2 collection, but comprises more structures to be selected throughout the weighting process with a corresponding lower percentage in the weighted FRET distribution. (D) Number of structures lost during KL validation and the final number selected for weighting. KL filtering was not applied to MD simulation structures.

To directly compare our experimental smFRET distribution with the predicted structure collections, we performed photon burst simulations for each structure collection, including the three RNA prediction tools and MD trajectories. This approach incorporates experimental conditions such as shot noise broadening, direct excitation, and gamma correction of the underlying energy-transfer efficiency $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ E_{\mathrm{DA}} $\end{document}$ distribution, resulting in an in silico FRET distribution (Fig. 4B). We note that accurate forward modeling of FRET distributions requires experimental control of dye photophysics in the given labeling context and the ACV dye model. Here, fluorescence dynamic anisotropy measurements indicated pronounced dye–RNA interactions for the donor (Supplementary Fig. S3 and Supplementary Tables S2 and S3), consistent with a comparably high donor quantum yield (0.46) used for both FRET prediction and experimental correction. Assuming fast dynamics of the RNA model construct in the FRET experiment, bursts were averaged across MD trajectories and subsamples of predicted structures, respectively. This uniform (unweighted) sampling yields in silico mean FRET efficiencies of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 0.23 \pm 0.05 $\end{document}$ (RNAComposer), $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 0.77 \pm 0.07 $\end{document}$ (FARFAR2), $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 0.24 \pm 0.05 $\end{document}$ (AlphaFold3), and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 0.65 \pm 0.08 $\end{document}$ (MD simulation), which remain broadly consistent with the corresponding means of the underlying $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ E_{\mathrm{DA}} $\end{document}$ distributions. The resulting FRET histograms are thus uniformly sampled from all structures (Fig. 4B—unweighted and Table 1—w/ photon sampling unweighted). Notably, despite partial overlap, none of the in silico FRET distributions generated from RNA 3D prediction tools or MD simulations fully recapitulate the experimental smFRET distribution.

Therefore, we investigated an alternative sampling approach in which structures are not uniformly sampled, but instead selected according to their probability in the experimental smFRET distribution. This strategy enables us to restrict the selection to structures that satisfy the KL filtering criteria while simultaneously enriching conformations that are most likely to be observed in an smFRET experiment. The resulting weighted in silico FRET distributions (Fig. 4B—weighted) now span the experimental smFRET range across all prediction tools and the MD collection, yielding similar mean FRET efficiencies of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 0.29 \pm 0.06 $\end{document}$ (RNAComposer), $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 0.32 \pm 0.06 $\end{document}$ (FARFAR2), $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 0.30 \pm 0.06 $\end{document}$ (AlphaFold3), and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ 0.31 \pm 0.06 $\end{document}$ (MD simulation) (Table 1—w/ photon sampling weighted). However, the experimental width of the distribution could not be fully reproduced, despite accounting for shot-noise broadening using the experimental burst-size distribution and applying gamma correction.

Notably, this probability-based selection can overrepresent individual structures when the underlying structure collections exhibit only a limited overlap with the experimental smFRET distribution. In the weighted RNAComposer collection, individual structures contribute >1% of the total collection, whereas FARFAR2 shows even higher values with single structures exceeding 1.5% and approaching 2%. In contrast, AlphaFold3 and the MD collection remain more diverse, with all individual structures contributing <1% (Fig. 4C). How strongly individual structures are overrepresented depends on the initial overlap between each KL filtered structure collection and the experimental smFRET distribution. A broader coverage provides more candidates per bin and supports more heterogeneous sampling. Consequently, the number of unique structures retained in the weighted collections differs across the prediction tools and the MD simulations. Consistent with this, the weighted structure collections comprise 431 (RNAComposer), 215 (FARFAR2), 533 (AlphaFold3), and 400 (MD) unique structures (Fig. 4D and Supplementary Fig. S11).

Discussion

FRET-guided integrative modeling has often relied on MD simulations to generate structure collections for comparison with experimental data [25, 27, 47]. While MD provides physically grounded structure collections, it is computationally demanding and often limited in its ability to sample broad conformational landscapes, particularly for flexible RNAs, efficiently. Building on earlier work in FRET-assisted modeling [27, 48, 49], we examine whether modern RNA 3D structure prediction tools can serve as practical alternatives to MD-based sampling. A central challenge in this context is that a single smFRET distance distribution is intrinsically underdetermined for complex RNA 3D folds: even after applying physical plausibility filters, multiple conformations can remain compatible with the same observable. Accordingly, rather than interpreting the modeling outcome as a single structure, our goal is to obtain a physically plausible collection of structures that is consistent with the measured FRET distribution under defined biochemical conditions. Within this framework, we assess the number of structures required to adequately represent a heterogeneous FRET state observed in a single FRET experiment, as well as the diversity and reliability of the resulting in silico FRET predictions and structure collections. For unbound and conformationally flexible RNAs, retaining heterogeneous solutions is therefore not a limitation of the approach, but a realistic representation of the expected in-solution state landscape.

Most notably, our work demonstrates that RNAComposer and AlphaFold3 predominantly model the unbound state between a GAAA tetraloop and a KL receptor as an extended conformation, resulting in a narrow distribution of low energy-transfer efficiencies that overlap with the experimental smFRET data. In our construct, a poly(A) linker connects the TL_GAAA_ with the KL and the linker is not expected to form stable interactions with either element in the presence of potassium(I) only [32]. The observed dye distance $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ R_{\mathrm{DA}} $\end{document}$ , and accordingly the energy-transfer efficiency, are influenced by the conformational flexibility of the linker [50–53]. Poly(A)-linkers are known to adopt stacked base conformations in transiently stable states [54, 55], which may restrict the TL_GAAA_ range of motion, consistent with conformations predicted by RNAComposer and AlphaFold3.

Interestingly, FARFAR2 generates a more diverse structure collection yielding in silico energy-transfer efficiencies that overlap with the experimental smFRET distribution. This includes poly(A)-linker conformations that allow greater spatial freedom of the TL_GAAA_, resulting in higher FRET efficiencies. These observations indicate that, despite identical initial conditions, the poly(A)-linker can explore a vast conformational space on the nanosecond timescale (Supplementary Figs S10 and S11).

Importantly, the differences in structural diversity observed across the three RNA 3D prediction approaches should not be interpreted in terms of “superior” or “inferior” performance, but rather as a consequence of their distinct design philosophies within the RNA-Puzzles community [11]. FARFAR2 is explicitly designed to sample the conformational space via fragment- and base-step sampling, combined with a physics-inspired scoring function, thereby generating structure collections that populate multiple low-energy basins. While this strategy enables broad structural diversity, it incurs substantial computational cost, and exhaustive sampling remains challenging for larger RNAs [37]. In contrast, RNAComposer primarily assembles 3D models from a secondary-structure-driven fragment library and typically returns a limited number of top-ranked low-energy solutions [56]. AlphaFold3, as a diffusion-based learning approach for biomolecular complexes including nucleic acids, similarly prioritizes high-confidence predictions over thermodynamic structure collections [2]. These differences are particularly relevant for our study, since the unbound state includes a flexible single-stranded poly(A)-linker connecting structured domains. Increasing evidence shows that single-stranded nucleic acids can exhibit broad, sequence- and condition-dependent conformational distributions and fast chain dynamics, making accurate modeling of such linkers non-trivial [53]. The broad structural spectrum generated by FARFAR2 can be interpreted as unbound conformations with substantial poly(A)-linker flexibility and conformations in which the tertiary contact is partially formed, i.e. precursor states where the GAAA tetraloop and the KL are in close spatial proximity. The latter correspond to high transfer efficiencies that are not apparent in the experimental distribution, although we note that smFRET may report time-averaged structure collections within the burst time window. The narrower distributions obtained from RNAComposer and AlphaFold3 reflect an unbound state, characterised by an elongated poly(A)-linker, which also figures among the structures predicted by FARFAR2.

Complementing the prediction-derived structure collections, MD simulations provide an independent, physics-based reference for interpreting a FRET distribution of a conformationally flexible RNA. Specifically, they model the dominant degrees of freedom of the poly(A)-linker on a physical timescale of nano- to microseconds. Practically, the MD simulations were initiated from multiple randomized starting orientations of the poly(A)-linker relative to the structured motif of the KL, resulting in an energy-transfer efficiency distribution that overlap with those obtained from FARFAR2 and RNAComposer/AlphaFold3.

To evaluate how well predicted structure collections reproduce experimentally observed FRET signals, we simulated photon bursts using FRETraj [27, 38, 39, 43]. This forward modeling ensures that the simulated FRET values reflect those expected in a real smFRET experiment. In the case of MD trajectories photons were simulated from structures linked on a time dimension. Within a burst, photons were sampled from the same trajectory. Conversely, structures generated by 3D prediction tools are not physically linked in time and thus were treated as if they were rapidly sampled during the photon detection time window. Consequently, the Monte-Carlo simulation models a fast-exchange regime, where the RNA explores the conformational space on short timescales compared to the burst window. The detected FRET signal reflects the corresponding time-averaged efficiencies. The simulated FRET distributions derived from both MD and FARFAR2 converge to similar mean FRET values of 0.65 and 0.77, respectively. This convergence illustrates that both static structural heterogeneity and fast conformational dynamics can yield the same average FRET signal when the motion timescale is fast relative to the burst detection time window. However, neither MD nor FARFAR2 fully reproduce the experimental FRET distribution.

As FARFAR2 generates a broad set of physically plausible conformations, one could hypothesize that a subset of these seemingly incompatible conformations would occur under different ionic conditions than those used in the present experiment. In fact, the higher-FRET conformations may represent precursors states toward a stable tertiary interaction between GAAA and the KL, as observed in the cryo-EM reference structure. This interpretation leads to a potential advantage of sampling-oriented prediction: it can produce conformations that may become relevant under altered conditions (e.g. different cation composition) or along a binding/folding pathway, even if they are not prominently populated in a given solution experiment. Finally, relevant RNA dynamics may also occur on slower timescales (microseconds to milliseconds) than assumed in the fast-exchange photon-burst simulation [57–59]. Longer MD simulations and/or enhanced sampling may be required to access more extended conformations associated with lower transfer efficiencies, as produced by RNAComposer and AlphaFold3.

To reconcile experimental and simulated distributions, we introduced a FRET-guided reweighting strategy. This procedure resamples the structure collections to be compatible with the experimental FRET distribution (Table 1 w/ photon sampling—weighted). A central prerequisite of this approach is that the FRET experiment probes the intended structural state. In the present case, we assume that the measured smFRET trajectory indeed reports on the unbound state, and that the observed FRET dynamics primarily reflect the mobility and spatial exploration of the GAAA tetraloop under the chosen solution conditions. In other words, the reweighting procedure is meaningful only if the FRET observable is sensitive to the conformational transition of interest and if the underlying construct populates the corresponding conformations in solution. Importantly, agreement between predicted and experimental FRET distributions can only be achieved if structural candidates with experimentally compatible $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ R_{\mathrm{DA}} $\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ E_{\mathrm{DA}} $\end{document}$ values are present in the predicted structure collections. Obviously, reweighting will not generate new conformations, it only redistributes probability mass among conformations that exhibit overlap between the predicted and experimentally observed FRET distributions. Consequently, higher conformational coverage will increase the success rate of reweighting.

After reweighting, all three RNA 3D structure prediction tools and the MD simulation yield FRET distributions with similar means and widths. Among the prediction algorithms, FARFAR2 most closely reproduces the experimental mean FRET efficiency (0.32), though its distribution remains slightly narrower than that observed experimentally, which is a consequence of structural averaging during burst simulation in FRETraj. RNAComposer and AlphaFold3 exclusively covered low-FRET states and thus tended to miss intermediate FRET bins. Additionally, structural entanglements were observed across all prediction methods and may contribute to a small fraction of sterically or topologically implausible conformations (Supplementary Table S4). However, their impact is limited here, because enforcing correct KL formation effectively excludes entanglements involving the linker region.

In summary, our workflow integrates RNA 3D structure prediction with smFRET distance distributions to perform solution-state selection among physically plausible models. We showed that none of the validated structure collections reproduces the low-FRET distribution without reweighting. A key advantage of this approach is that smFRET can efficiently reject implausible candidates and enrich for conformations consistent with the defined experimental conditions. Capturing the highly dynamic nature of RNA [60], thus requires structure prediction approaches that provide sufficiently diverse structure collections. Precise single-molecule measurements impose realistic constraints on this candidate pool, enabling effective re-sampling and selection of concordant conformations. Applying this strategy to larger and more complex RNAs will likely require multiple non-redundant FRET coordinates that report on orthogonal conformational transitions. Overall, this work supports a gradual paradigm shift from picking a single “best” model toward identifying collections of models that are structurally compatible with solution-based experimental observables.

Supplementary Material

gkag147_Supplemental_File

Bibliography60

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Karplus M, Mc Cammon JA. Molecular dynamics simulations of biomolecules. Nat Struct Biol. 2002;9:646–52. 10.1038/nsb 0902-646.12198485 · doi ↗ · pubmed ↗
2Abramson J, Adler J, Dunger J et al. Accurate structure prediction of biomolecular interactions with Alpha Fold 3. Nature. 2024;630:493–500. 10.1038/s 41586-024-07487-w.38718835 PMC 11168924 · doi ↗ · pubmed ↗
3Nam K, Wolf-Watz M. Protein dynamics: the future is bright and complicated!. Struct Dyn. 2023;10:014301. 10.1063/4.0000179.36865927 PMC 9974214 · doi ↗ · pubmed ↗
4Hospital A, Goñi JR, Orozco M et al. Molecular dynamics simulations: advances and applications. Adv Appl Bioinform Chem. 2015;8:37–47. 10.2147/AABC.S 70333.26604800 PMC 4655909 · doi ↗ · pubmed ↗
5Statello L, Guo CJ, Chen LL et al. Gene regulation by long noncoding RN As and its biological functions. Nat Rev Mol Cell Biol. 2021;22:96–118. 10.1038/s 41580-020-00315-9.33353982 PMC 7754182 · doi ↗ · pubmed ↗
6Pucci F, Schug A. Shedding light on the dark matter of the biomolecular structural universe: progress in RNA 3D structure prediction. Methods. 2019;162-163:68–73. 10.1016/j.ymeth.2019.04.012.31028927 · doi ↗ · pubmed ↗
7Cruz JA, Blanchet MF, Boniecki M et al. RNA-Puzzles: a CASP-like evaluation of RNA three-dimensional structure prediction. RNA. 2012;18:610–25. 10.1261/rna.031054.111.22361291 PMC 3312550 · doi ↗ · pubmed ↗
8Miao Z, Adamiak RW, Blanchet MF et al. RNA-Puzzles Round II: assessment of RNA structure prediction programs applied to three large RNA structures. RNA. 2015;21:1066–84. 10.1261/rna.049502.114.25883046 PMC 4436661 · doi ↗ · pubmed ↗