# Identification of cell-type-specific, transcriptionally active transposable elements using long-read RNA-sequencing data-based comprehensive annotation

**Authors:** Chaemin Lim, Hyunsu An, Jihwan Park

PMC · DOI: 10.1186/s44342-025-00048-1 · Genomics & Informatics · 2025-08-06

## TL;DR

This study creates a comprehensive annotation of transposable element-derived transcripts using long-read RNA-sequencing data to identify cell-type-specific activity in human tissues.

## Contribution

The study introduces a novel, accurate TE-derived transcript annotation using LR RNA-seq data, enabling detection of cell-type-specific transcripts.

## Key findings

- The TE annotation outperformed RepeatMasker and GENCODE in detecting TE-derived transcripts.
- Cell-type-specific TE-derived transcripts were identified in multiple human tissues.
- Alternative transcription end sites and TE-nonTE gene fusions were confirmed.

## Abstract

The biological functions of transposable element (TE)-derived transcripts during physiological development, disease development, and progression have been previously reported. However, research on locus-specific TE-derived transcript expression in various human cell types remains limited.

We processed 2596 publicly available human long-read RNA-sequencing (LR RNA-seq) datasets covering 21 organs and 71 cell lines in both healthy individuals and diseased patients with various conditions to compile this TE-derived transcript annotation. We established a pipeline for assembling transcripts containing TE sequences to measure transcriptionally active TE-derived transcripts in diverse tissues and cell types. Next, we applied our TE annotation to the Genotype-Tissue Expression (GTEx) single-cell RNA-sequencing (scRNA-seq) data from eight tissues.

We constructed the first transcriptom6e-based TE annotation using massive amounts of human LR RNA-seq data for use as a comprehensive reference to detect locus-specific TE-derived transcripts. Our annotation showed better detection accuracy for TE-derived transcripts than the RepeatMasker and GENCODE nonTE gene annotations. This annotation enabled the identification of novel TE-derived transcripts and their isoforms. We also identified alternative transcription end sites for long noncoding genes and confirmed previously annotated TE-nonTE gene fusion transcripts. Next, we applied our TE-derived transcript annotation to public scRNA-seq data from various human tissues and identified several cell-type-specific TE-derived transcripts in a locus-specific manner.

We generated a comprehensive, TE-derived transcript annotation using large-scale, LR RNA-seq data. Researchers can use our TE reference annotation to analyze active TE transcripts and their splicing isoforms in specific transcriptome datasets and to detect de novo TE transcripts. The discovery of cell-type-specific TE-derived transcripts may help explain mechanisms underlying the maintenance of cellular identity and provide new insights into the pathological mechanisms of various diseases.

The online version contains supplementary material available at 10.1186/s44342-025-00048-1.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12326599/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12326599/full.md

---
Source: https://tomesphere.com/paper/PMC12326599