# PymiRa: A rapid and accurate classification tool for small non-coding RNAs, including microRNAs

**Authors:** Zachary G. L. Scurlock, Cinzia G. Scarpini, Nicholas Coleman, Matthew J. Murray, Anton J. Enright

PMC · DOI: 10.1371/journal.pcbi.1014114 · PLOS Computational Biology · 2026-03-26

## TL;DR

PymiRa is a new tool for quickly and accurately identifying small non-coding RNAs like microRNAs from sequencing data.

## Contribution

PymiRa combines two alignment methods to improve accuracy and speed in miRNA classification.

## Key findings

- PymiRa uses a Burrows-Wheeler algorithm with two 3’ mismatches to align miRNAs.
- The tool improves accuracy and efficiency compared to existing methods.
- PymiRa is publicly available and will be updated with miRBase revisions.

## Abstract

Small non-coding RNAs (sncRNA; < 200 nucleotide length) are of increasing research interest due to their key regulatory roles in a host of fundamental biological processes. For example, microRNAs (miRNAs), a specific class of sncRNAs, regulate gene expression through messenger RNA (mRNA) interactions, and their dysregulation is associated with disease. Classifying sncRNAs is an important bioinformatic task in small RNA-sequencing pipelines. Here we have developed an aligner called PymiRa, written in Python, to identify and quantify miRNAs from FASTA/FASTQ sequencing files. Unlike other approaches, PymiRa utilises a Burrows-Wheeler algorithm to align an input file against a reference hairpin precursor FASTA file derived from miRBase, the online miRNA registry, permitting up to two mismatches at the 3’ end of a read. Previous tools used either a Burrows-Wheeler genome alignment or dynamic programming alignment to precursors; we demonstrate that combining both approaches yields improved results and efficiency. Importantly, the PymiRa aligner accounts for 3’ post-transcriptional modifications to miRNAs that typically occur. PymiRa is a fast, accurate, and publicly accessible aligner available via GitHub and/or a webserver for sncRNA identification, including miRNAs, enabling accurate counts to be produced as part of a small RNA-sequencing pipeline. PymiRa will undergo relevant revisions over time e.g., with miRBase version updates. The PymiRa aligner will facilitate a deeper biological understanding of the landscape of sncRNA expression in normal physiological conditions and their dysregulation in disease states, including cancer.

RNA-sequencing is a popular methodology for studying levels of RNAs in different biological samples, with large amounts of data generated. There is an increasing volume of research into small RNAs and how they may be used to detect disease and/or be the targets for new treatments. However, identifying these RNAs accurately and rapidly from sequencing data is challenging. Current methodologies often struggle to correctly identify these RNAs and take much longer to count them. Here we present PymiRa, a rapid, accurate, and accessible tool to identify small RNAs from sequencing experiments. By studying how small RNA levels are different in biological samples, this can help find new ways to detect and treat diseases.

## Linked entities

- **Diseases:** cancer (MONDO:0004992)

## Full-text entities

- **Genes:** DROSHA (drosha ribonuclease III) [NCBI Gene 29102] {aka ETOHI2, HSA242976, RANSE3L, RN3, RNASE3L, RNASEN}, DICER1 (dicer 1, ribonuclease III) [NCBI Gene 23405] {aka DCR1, Dicer, Dicer1e, GLOW, HERNA, K12H4.8-LIKE}, TUT1 (terminal uridylyl transferase 1, U6 snRNA-specific) [NCBI Gene 64852] {aka PAPD2, RBM21, STARPAP, TENT1, TUTase, URLC6}, XPO5 (exportin 5) [NCBI Gene 57510] {aka exp5}
- **Diseases:** cardiovascular disease (MESH:D002318), diffuse large B-cell lymphomas (MESH:D016403), malignant germ cell tumours (MESH:D009373), cancer (MESH:D009369)
- **Chemicals:** I (MESH:D007455), Bowtie2 (-), A) (MESH:D001151), adenosine (MESH:D000241), inosine (MESH:D007288)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13020842/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13020842/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/PMC13020842/full.md

---
Source: https://tomesphere.com/paper/PMC13020842