# Semi-supervised segmentation of RNA 3D structures using density-based clustering

**Authors:** Quoc Khang Le, Eric Angel, Fariza Tahi, Guillaume Postic

PMC · DOI: 10.1016/j.csbj.2025.08.037 · Computational and Structural Biotechnology Journal · 2025-09-10

## TL;DR

This paper introduces a new method to identify 3D domains in RNA structures using clustering algorithms, which could help understand RNA function and evolution.

## Contribution

The paper introduces RNA3DClust, a novel method for RNA 3D domain segmentation using Mean Shift clustering and a new scoring function.

## Key findings

- RNA3DClust successfully partitions RNA 3D structures into biologically meaningful domains.
- A new scoring function called Chain Segment Distance (CSD) was developed to evaluate segmentation quality.
- The method's results align with RNA domain annotations from the Rfam database.

## Abstract

A growing body of evidence shows that the biological activity of RNA molecules is not only due to their primary and secondary structures, but also to their spatial conformation. This is analogous to proteins, where investigating function, folding, or evolution often requires dividing the three-dimensional (3D) structure into subparts that can be studied individually. These independent substructures, known as protein “3D domains”, are geometrically defined as compact and spatially separate regions of the polypeptide chain. In RNA macromolecules, however, and to the best of our knowledge, no equivalent 3D-based concept has yet been formulated. We present RNA3DClust, an application of the Mean Shift clustering algorithm to the RNA 3D structure partitioning problem. For this work, a dedicated post-clustering procedure was developed to address the peculiarities of delimiting 3D domains in RNA conformations. Tuning and benchmarking RNA3DClust required us to create reference datasets of RNA 3D domain annotations and to devise a new scoring function—the Chain Segment Distance (CSD)—for assessing segmentation quality. Importantly, we show that the domain decompositions produced by RNA3DClust are consistent with those based on RNA biological function and evolution. Finally, the emerging interest in long non-coding RNAs (lncRNAs) and their likeliness of containing folded regions has motivated us to generate an additional reference dataset of lncRNA predicted conformations. The resulting delineations of 3D domains by RNA3DClust illustrate the potential of our method for analyzing lncRNA 3D structures. Source code and datasets are freely available for download on the EvryRNA platform at: https://evryrna.ibisc.univ-evry.fr.

•This study introduces the concept of compact and separate “3D domains” in RNA.•We used Mean Shift and post-processing to segment RNA structures into 3D domains.•We created a new scoring function to assess the quality of 3D domain decompositions.•RNA 3D domains show consistency with domain annotations from the Rfam database.

This study introduces the concept of compact and separate “3D domains” in RNA.

We used Mean Shift and post-processing to segment RNA structures into 3D domains.

We created a new scoring function to assess the quality of 3D domain decompositions.

RNA 3D domains show consistency with domain annotations from the Rfam database.

## Full-text entities

- **Diseases:** neurological disorders (MESH:D009461), CSD (MESH:C537538), Alzheimer's disease (MESH:D000544), cardiovascular diseases (MESH:D002318), cancer (MESH:D009369)
- **Chemicals:** lysine (MESH:D008239), nucleotide (MESH:D009711), glucosamine-6-phosphate (MESH:C001293), IoU (-)
- **Species:** [Eubacterium] siraeum (species) [taxon 39492], Triticum aestivum (bread wheat, species) [taxon 4565], Oryctolagus cuniculus (domestic rabbit, species) [taxon 9986], Thermotoga maritima (species) [taxon 2336], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Homo sapiens (human, species) [taxon 9606], Trypanosoma brucei brucei (subspecies) [taxon 5702], Escherichia coli (E. coli, species) [taxon 562], hepatitis C virus [taxon 11103], Geobacillus stearothermophilus (species) [taxon 1422], Taura syndrome virus (no rank) [taxon 142102], Tetrahymena (genus) [taxon 5890]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12800371/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12800371/full.md

## References

59 references — full list in the complete paper: https://tomesphere.com/paper/PMC12800371/full.md

---
Source: https://tomesphere.com/paper/PMC12800371