NLP Meets RNA: Unsupervised Embedding Learning for Ribozymes with Word2Vec
Andrew Kean Gao

TL;DR
This paper applies Word2Vec, an NLP embedding technique, to learn meaningful vector representations of ribozymes, enabling improved classification and understanding of their structural features in bioinformatics.
Contribution
It introduces Ribo2Vec, the first application of Word2Vec to ribozyme sequences, demonstrating effective embeddings for classification and analysis.
Findings
Embeddings distinguish between ribozyme classes via PCA.
SVM classifier achieves promising accuracy with embeddings.
256D embeddings perform similarly to 128D, indicating lower dimensions suffice.
Abstract
Ribozymes, RNA molecules with distinct 3D structures and catalytic activity, have widespread applications in synthetic biology and therapeutics. However, relatively little research has focused on leveraging deep learning to enhance our understanding of ribozymes. This study implements Word2Vec, an unsupervised learning technique for natural language processing, to learn ribozyme embeddings. Ribo2Vec was trained on over 9,000 diverse ribozymes, learning to map sequences to 128 and 256-dimensional vector spaces. Using Ribo2Vec, sequence embeddings for five classes of ribozymes (hatchet, pistol, hairpin, hovlinc, and twister sister) were calculated. Principal component analysis demonstrated the ability of these embeddings to distinguish between ribozyme classes. Furthermore, a simple SVM classifier trained on ribozyme embeddings showed promising results in accurately classifying ribozyme…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · RNA modifications and cancer · Machine Learning in Bioinformatics
MethodsSupport Vector Machine
