# Boundary-associated propagation of a processed pseudogene dissects pre-existing limitations of genome annotation in the T2T era

**Authors:** Min-Gyu Lee

PMC · DOI: 10.1186/s13100-026-00394-z · 2026-02-17

## TL;DR

This study shows how processed pseudogenes can be misinterpreted due to genomic duplication patterns revealed by T2T assemblies, highlighting the need for better genome annotations.

## Contribution

The study demonstrates that processed pseudogene copies can propagate structurally and be misclassified if evolutionary and structural context is not considered.

## Key findings

- Processed pseudogenes like CICP12 are dispersed across great ape genomes within segmental duplication blocks.
- Purifying selection on SEPTIN14 suggests its terminal exon is conserved, not newly duplicated.
- Secondary propagation of RNA-derived insertions can lead to multiple annotated loci in duplication-rich regions.

## Abstract

Processed pseudogenes and retrogenes are defined by their RNA-mediated origin and, by virtue of this origin-based definition, are often interpreted as discrete genomic insertions. The completion of telomere-to-telomere (T2T) reference assemblies has substantially improved the resolution of segmental duplication architectures and centromeric satellite sequences that were previously inaccessible, allowing genomic structural contexts that were effectively invisible in earlier references to be directly examined.

Using the SEPTIN14P-CICP locus family as a case study, chain-based comparative analyses showed that a genomic window spanning the SEPTIN14 3′ terminal exon and the adjacent processed pseudogene CICP12 is dispersed into multiple segmental duplication-associated units across great apes, rather than being maintained as a single orthologous locus. Genome-wide analyses further indicated that annotated CICP loci preferentially localize within segmental duplication blocks and accumulate near pericentromeric or subtelomeric regions. Despite this duplication-associated dispersion, codon-based selection analyses revealed pervasive purifying selection acting on the full-length SEPTIN14 coding sequence and its 3′ terminal exon, arguing against a model in which the terminal exon was newly formed through segmental duplication. Together, these results show that when highly conserved, strongly constrained coding regions are embedded within segmental duplication-rich regions, co-dispersed processed pseudogene copies can be interpreted as distinct from independently generated LINE-1-mediated insertions and as reflecting secondary structural propagation.

When considered in light of origin-based definitions of processed pseudogenes and retrogenes, and specifically within duplication-rich and structurally unstable genomic regions resolved by T2T-level assemblies, these results suggest that multiple annotated loci can arise through secondary propagation of a single RNA-derived insertion. Under such contexts, incorporation of selective constraint and cross-species conservation enables more reliable distinction between source insertions and their secondarily propagated copies. This case study highlights a limitation of current annotation frameworks and demonstrates the need for more precise annotation that incorporates evolutionary and structural context in the T2T era.

The online version contains supplementary material available at 10.1186/s13100-026-00394-z.

## Linked entities

- **Genes:** SEPTIN14 (septin 14) [NCBI Gene 346288], CICP12 (capicua transcriptional repressor pseudogene 12) [NCBI Gene 100420224]

## Full-text entities

- **Mutations:** T2T

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13015009/full.md

---
Source: https://tomesphere.com/paper/PMC13015009