# An expanded reference catalog of translated open reading frames for biomedical research

**Authors:** Sonia Chothani, Jorge Ruiz-Orera, Jack A S Tierney, Michal I Swirski, Hakon Tjeldnes, Leron W Kok, Jim Clauwaert, Eric W Deutsch, M Mar Alba, Julie L Aspden, Pavel V Baranov, Ariel Alejandro Bazzini, Elspeth A Bruford, Marie A Brunet, Tristan Cardon, Anne-Ruxandra Carvunis, Claudio Casola, Jyoti Sharma Choudhary, Kellie Dean, Pouya Faridi, Ivo Fierro-Monti, Isabelle Fournier, Adam Frankish, Mark Gerstein, Norbert Hubner, Yunzhe Jiang, Manolis Kellis, Thomas F Martinez, Gerben Menschaert, Pengyu Ni, Sandra Orchard, Xavier Roucou, Joel Rozowsky, Michel Salzet, Mauro Siragusa, Sarah Slavoff, Nicola Ternette, Juan Antonio Vizcaino, Aaron Wacholder, Wei Wu, Zhi Xie, Yucheng T Yang, Robert L Moritz, Eivind Valen, Jonathan Mudge, Sebastiaan van Heesch, John R Prensner, Owen J L Rackham

PMC · DOI: 10.1093/nar/gkag234 · 2026-03-24

## TL;DR

This paper introduces an expanded and refined catalog of non-canonical open reading frames (ncORFs) in the human genome, providing a more comprehensive and reliable reference for biomedical research.

## Contribution

The paper presents a data-driven framework to assess translation evidence for ncORFs, resulting in a high-quality primary set comparable to canonical genes.

## Key findings

- The catalog now includes 28,359 ncORFs, nearly four times the size of the previous version.
- A subset of 10,127 ncORFs with strong translation evidence was identified as a reliable reference for research.
- The updated catalog is community-driven and aims to improve accessibility and utility of ncORFs in biomedical studies.

## Abstract

Non-canonical (i.e. unannotated) open reading frames (ncORFs) have until recently been omitted from reference genome annotations, despite evidence of their translation, limiting their incorporation into biomedical research. To address this, in 2022, we initiated the TransCODE consortium and built the first community-driven consensus catalog of human ncORFs, which was openly distributed to the research community via Ensembl-GENCODE. While this catalog represented a starting point for reference ncORF annotation, major technical and scientific issues remained. In particular, this initial catalog had no standardized framework to judge the evidence of translation for individual ncORFs. Here, we present an expanded and refined catalog of the human reference annotation of ncORFs. By incorporating more datasets and by lifting constraints on ORF length and start codon, we define a comprehensive set of 28 359 ncORFs that is nearly four times the size of the previous catalog. Furthermore, to aid users who wish to work with ncORFs with the strongest and most reproducible signals of translation, we utilized a data-driven framework (i.e. translation signature scores) to assess the accumulated evidence for any individual ncORF. Using this approach, we derive a subset of 10 127 ncORFs with translation evidence on par with canonical protein-coding genes, which we refer to as the primary set. This set can serve as a reliable reference for downstream analyses and validation, with a particular emphasis on high quality. Overall, this update reflects continuous community-driven efforts to make ncORFs accessible and actionable to the broader research public, and further iterations of the catalog will continue to expand and refine this resource.

Graphical Abstract

## Full-text entities

- **Genes:** IGHV1-2 (immunoglobulin heavy variable 1-2) [NCBI Gene 28474] {aka IGHV12, V35}, MTLN (mitoregulin) [NCBI Gene 205251] {aka LEMP, LINC00116, MOXI, MPM, NCRNA00116, SMIM37}, MRLN (myoregulin) [NCBI Gene 100507027] {aka LINC00948, Linc-RAM, M1, MLN, MUSER1}, ATF4 (activating transcription factor 4) [NCBI Gene 468] {aka CREB-2, CREB2, TAXREB67, TXREB}, LMBRD2 (LMBR1 domain containing 2) [NCBI Gene 92255] {aka DENBA}
- **Diseases:** Cancer (MESH:D009369), Brain Tumor (MESH:D001932)
- **Chemicals:** homoharringtonine (MESH:D000077863), lactimidomycin (MESH:C077633), Ribo (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232)

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13010147/full.md

---
Source: https://tomesphere.com/paper/PMC13010147