# TRANSAID: a hybrid deep learning framework for translation site prediction with integrated biological feature scoring

**Authors:** Yan Li, Boran Wang, Zhen Liu, Wei Wei, Caiyi Fei, Shi Xu, Tiyun Han, Wei Geng, Zengding Wu

PMC · DOI: 10.3389/fbinf.2025.1676149 · Frontiers in Bioinformatics · 2026-01-19

## TL;DR

TRANSAID is a deep learning tool that accurately predicts translation initiation and termination sites in transcripts, improving annotation and discovery of new proteins.

## Contribution

A novel deep learning framework that simultaneously predicts translation sites with integrated biological scoring and cross-species applicability.

## Key findings

- TRANSAID correctly identifies 73.61% of non-coding transcripts as non-coding.
- The model achieves 94.94% accuracy in predicting coding sequences and 82.00% for non-coding sequences.
- It identifies previously unannotated protein isoforms with 76.28% validation rate using mass spectrometry.

## Abstract

Translation initiation and termination are critical regulatory checkpoints in protein synthesis, yet accurate computational prediction of their sites remains challenging due to training data biases and the complexity of full-length transcripts.

To address these limitations, we present TRANSAID (TRANSlation AI for Detection), a novel deep learning framework that accurately and simultaneously predicts translation initiation (TIS) and termination (TTS) sites from complete transcript sequences. TRANSAID’s hierarchical architecture efficiently processes long transcripts, capturing both local motifs and long-range dependencies. Crucially, the model was trained on a human transcriptome dataset that was rigorously partitioned at the gene level to prevent data leakage and included both protein-coding (NM) and non-coding (NR) transcripts.

This mixed-training strategy enables TRANSAID to achieve high fidelity, correctly identifying 73.61% of NR transcripts as non-coding. Performance is further enhanced by an integrated biological scoring system, improving “perfect ORF prediction” for coding sequences to 94.94% and “correct non-coding prediction” to 82.00%. The human-trained model demonstrates remarkable cross-species applicability, maintaining high accuracy on organisms from mammals to yeast. Beyond annotation, TRANSAID serves as a powerful discovery tool for novel coding events. When applied to long-read sequencing data, it accurately identified previously unannotated protein isoforms validated by mass spectrometry (76.28% validation rate). Furthermore, homology searches of high-scoring ORFs predicted within NR transcripts suggest a strong potential for identifying cryptic translation events.

As a fully documented open-source tool with a user-friendly web server, TRANSAID provides a powerful and accessible resource for improving transcriptome annotation and proteomic discovery.

## Linked entities

- **Species:** Homo sapiens (taxon 9606)

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12862215/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12862215/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/PMC12862215/full.md

---
Source: https://tomesphere.com/paper/PMC12862215