# Deep generative model of RNAs based on variational autoencoder with context-free grammar

**Authors:** Goro Terai, Kiyoshi Asai

PMC · DOI: 10.1093/bioinformatics/btaf427 · Bioinformatics · 2025-07-29

## TL;DR

This paper introduces a deep learning model that generates RNA sequences while considering their unique secondary structures, improving RNA design for research and engineering.

## Contribution

A novel deep generative model combining CFG and VAE for RNA sequence generation with explicit structural awareness.

## Key findings

- The model generates high-quality RNA sequences validated against natural RNAs from Rfam.
- The latent space correlates with RNA self-cleaving activity in aptazyme mutants.
- Dynamic programming ensures accurate reconstruction of RNA structure during generation.

## Abstract

RNA plays a crucial role in cellular functions, and designing functional RNA sequences is essential for both scientific exploration and bioengineering applications. Conventional RNA design approaches typically assume a shared secondary structure among designed sequences. However, even closely related RNAs can adopt different secondary structures, particularly when artificial mutations are introduced.

We present a novel deep generative model that integrates context-free grammar (CFG) with a variational autoencoder (VAE) to generate RNA sequences while explicitly considering their individual secondary structures. In our method, RNA sequences and their structures are represented as parse trees based on CFG, which are then transformed into binary matrices for VAE training. The optimal parse tree is reconstructed using dynamic programming, ensuring structure-aware sequence generation. When evaluated on natural RNAs from the Rfam database, our model successfully generates high-quality RNA sequences. Furthermore, when applied to RNA aptazyme mutants with distinct secondary structures, our method reveals a strong correlation between the latent space representation of the VAE and self-cleaving activity. This underscores the importance of incorporating RNA-specific structural information in generative models.

https://github.com/gterai/RNAgg (archived at Zenodo: https://doi.org/10.5281/zenodo.15354990).

## Full-text entities

- **Genes:** TRNG (tRNA-Gly) [NCBI Gene 4563] {aka MTTG}
- **Chemicals:** CFG (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12342829/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12342829/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/PMC12342829/full.md

---
Source: https://tomesphere.com/paper/PMC12342829