An expanded reference catalog of translated open reading frames for biomedical research
Sonia Chothani, Jorge Ruiz-Orera, Jack A S Tierney, Michal I Swirski, Hakon Tjeldnes, Leron W Kok, Jim Clauwaert, Eric W Deutsch, M Mar Alba, Julie L Aspden, Pavel V Baranov, Ariel Alejandro Bazzini, Elspeth A Bruford, Marie A Brunet, Tristan Cardon, Anne-Ruxandra Carvunis

TL;DR
This paper introduces an expanded and refined catalog of non-canonical open reading frames (ncORFs) in the human genome, providing a more comprehensive and reliable reference for biomedical research.
Contribution
The paper presents a data-driven framework to assess translation evidence for ncORFs, resulting in a high-quality primary set comparable to canonical genes.
Findings
The catalog now includes 28,359 ncORFs, nearly four times the size of the previous version.
A subset of 10,127 ncORFs with strong translation evidence was identified as a reliable reference for research.
The updated catalog is community-driven and aims to improve accessibility and utility of ncORFs in biomedical studies.
Abstract
Non-canonical (i.e. unannotated) open reading frames (ncORFs) have until recently been omitted from reference genome annotations, despite evidence of their translation, limiting their incorporation into biomedical research. To address this, in 2022, we initiated the TransCODE consortium and built the first community-driven consensus catalog of human ncORFs, which was openly distributed to the research community via Ensembl-GENCODE. While this catalog represented a starting point for reference ncORF annotation, major technical and scientific issues remained. In particular, this initial catalog had no standardized framework to judge the evidence of translation for individual ncORFs. Here, we present an expanded and refined catalog of the human reference annotation of ncORFs. By incorporating more datasets and by lifting constraints on ORF length and start codon, we define a comprehensive…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · Genomics and Phylogenetic Studies · Biomedical Text Mining and Ontologies
