# Popcorn: prediction of short coding and noncoding genomic sequences in prokaryotes

**Authors:** Alison Kyrouz, Lian Liu, Lixin Qin, Brian Tjaden

PMC · DOI: 10.1093/bioinformatics/btaf250 · Bioinformatics · 2025-04-25

## TL;DR

Popcorn is a machine learning tool that helps identify whether short prokaryotic genomic sequences are coding or noncoding.

## Contribution

Popcorn introduces a novel machine learning method to distinguish coding and noncoding sequences in prokaryotes.

## Key findings

- Popcorn effectively distinguishes coding sORFs from noncoding RNAs in prokaryotic sequences.
- The method is accurate in identifying small transcripts that may encode novel proteins or regulatory RNAs.

## Abstract

The most challenging prokaryotic genes to identify often correspond to short ORFs (sORFs) encoding small proteins or to noncoding RNAs. RNA-seq experiments commonly evince small transcripts that do not correspond to annotated genes and are candidates for novel coding sORFs or small regulatory RNAs, but it can be difficult to accurately assess whether the numerous small transcripts are coding or not. We present Popcorn (PrOkaryotic Prediction of Coding OR Noncoding), a novel machine learning method for determining whether prokaryotic sequences are coding or noncoding. We find that Popcorn is effective in distinguishing coding from noncoding sequences, including coding sORFs and noncoding RNAs.

Freely available for use on the web at https://cs.wellesley.edu/∼btjaden/Popcorn. Source code available at https://github.com/btjaden/Popcorn and https://doi.org/10.5281/zenodo.15120075.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606], Escherichia coli (E. coli, species) [taxon 562]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12054974/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12054974/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/PMC12054974/full.md

---
Source: https://tomesphere.com/paper/PMC12054974