# GINClus: RNA structural motif clustering using graph isomorphism network

**Authors:** Nabila Shahnaz Khan, Md Mahfuzur Rahaman, Shaojie Zhang

PMC · DOI: 10.1093/nargab/lqaf050 · NAR Genomics and Bioinformatics · 2025-04-26

## TL;DR

This paper introduces GINClus, a deep learning tool that clusters RNA structural motifs using graph representations, improving accuracy and enabling the discovery of new motif families.

## Contribution

The novel contribution is the development of GINClus, a semi-supervised deep learning model for RNA motif clustering based on graph isomorphism networks.

## Key findings

- GINClus achieved 87.88% accuracy for internal loop motifs and 97.69% for hairpin loop motifs.
- The tool identified 927 new instances of known RNA motif families and 12 new motif families with unique structures.

## Abstract

Ribonucleic acid (RNA) structural motif identification is a crucial step for understanding RNA structure and functionality. Due to the complexity and variations of RNA 3D structures, identifying RNA structural motifs is challenging and time-consuming. Particularly, discovering new RNA structural motif families is a hard problem and still largely depends on manual analysis. In this paper, we proposed an RNA structural motif clustering tool, named GINClus, which uses a semi-supervised deep learning model to cluster RNA motif candidates (RNA loop regions) based on both base interaction and 3D structure similarities. GINClus converts base interactions and 3D structures of RNA motif candidates into graph representations and using graph isomorphism network (GIN) model in combination with K-means and hierarchical agglomerative clustering, GINClus clusters the RNA motif candidates based on their structural similarities. GINClus has a clustering accuracy of 87.88% for known internal loop motifs and 97.69% for known hairpin loop motifs. Using GINClus, we successfully clustered the motifs of the same families together and were able to find 927 new instances of Sarcin-ricin, Kink-turn, Tandem-shear, Hook-turn, E-loop, C-loop, T-loop, and GNRA loop motif families. We also identified 12 new RNA structural motif families with unique structure and base-pair interactions.

## Full-text entities

- **Diseases:** CL (MESH:D001765)
- **Chemicals:** cytosine (MESH:D003596), 23S rRNA (-), uracil (MESH:D014498), adenine (MESH:D000225)
- **Species:** Thermus thermophilus (species) [taxon 274], Haloarcula marismortui (species) [taxon 2238]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12034103/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12034103/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/PMC12034103/full.md

---
Source: https://tomesphere.com/paper/PMC12034103