# Mutational constraint analysis workflow for overlapping short open reading frames and genomic neighbors

**Authors:** Martin Danner, Matthias Begemann, Florian Kraft, Miriam Elbracht, Ingo Kurth, Jeremias Krause

PMC · DOI: 10.1186/s12864-025-11444-w · BMC Genomics · 2025-03-14

## TL;DR

This paper introduces a workflow to analyze genetic constraints on short open reading frames (sORFs) and their genomic neighbors using population data.

## Contribution

The study provides the first population-level constraint metrics for sORFs using the gnomAD 4.0 dataset.

## Key findings

- sORFs are mostly found in moderately constrained genomic regions.
- A subset of sORFs shows high constraint levels similar to canonical genes.
- The analysis helps identify potentially functional sORFs for further study.

## Abstract

Understanding the dark genome is a priority task following the complete sequencing of the human genome. Short open reading frames (sORFs) are a group of largely unexplored elements of the dark genome with the potential for being translated into microproteins. The definitive number of coding and regulatory sORFs is not known, however they could account for up to 1–2% of the human genome. This corresponds to an order of magnitude in the range of canonical coding genes. For a few sORFs a clinical relevance has already been demonstrated, but for the majority of potential sORFs the biological function remains unclear. A major limitation in predicting their disease relevance using large-scale genomic data is the fact that no population-level constraint metrics for genetic variants in sORFs are yet available. To overcome this, we used the recently released gnomAD 4.0 dataset and analyzed the constraint of a consensus set of sORFs and their genomic neighbors. We demonstrate that sORFs are mostly embedded into a moderately constrained genomic context, but within the gencode dataset we identified a subset of highly constrained sORFs comparable to highly constrained canonical genes.

The online version contains supplementary material available at 10.1186/s12864-025-11444-w.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11909976/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11909976/full.md

## References

2 references — full list in the complete paper: https://tomesphere.com/paper/PMC11909976/full.md

---
Source: https://tomesphere.com/paper/PMC11909976