# ADAT novel time-series-aware adaptive transformer architecture for sign language translation

**Authors:** Nada Shahin, Leila Ismail

PMC · DOI: 10.1038/s41598-026-36293-9 · Scientific Reports · 2026-01-28

## TL;DR

This paper introduces ADAT, a new Transformer-based model for sign language translation that improves accuracy and training efficiency by better capturing temporal dependencies.

## Contribution

ADAT introduces an adaptive Transformer architecture with log-sparse attention and dual-stream processing for efficient sign language translation.

## Key findings

- ADAT improves BLEU-4 scores by at least 0.1% in sign-to-gloss-to-text translation.
- ADAT reduces training time by an average of 21% across datasets.
- ADAT achieves 0.5% gains in BLEU-4 for sign-to-text translation over baselines.

## Abstract

Current sign language machine translation systems rely on recognizing hand movements, facial expressions, and body postures, and natural language processing, to convert signs into text. While recent approaches use Transformer architectures to model long-range dependencies via positional encoding, they lack accuracy in recognizing fine-grained, short-range temporal dependencies between gestures captured at high frame rates. Moreover, their quadratic attention complexity leads to inefficient training. To mitigate these issues, we introduce ADAT, an Adaptive Transformer architecture that combines convolutional feature extraction, log-sparse self-attention, and an adaptive gating mechanism to efficiently model both short- and long-range temporal dependencies in sign language sequences. We evaluate ADAT on three datasets: the benchmark RWTH-PHOENIX-Weather-2014 (PHOENIX14T), the ISL-CSLTR, and the newly introduced MedASL, a medical-domain American Sign Language corpus. In sign-to-gloss-to-text translation, ADAT outperforms the state-of-the-art baselines, improving BLEU-4 by at least 0.1% and reducing training time by an average of 21% across datasets. In sign-to-text translation, ADAT consistently surpasses transformer-based encoder-decoder baselines, achieving a minimum of 0.5% gains in BLEU-4 and an average training speedup of 21.8% across datasets. Compared to the encoder-only and decoder-only baselines in sign-to-text, ADAT is at least 0.7% more accurate, despite being up to 12.1% slower due to its dual-stream structure.

The online version contains supplementary material available at 10.1038/s41598-026-36293-9.

## Full-text entities

- **Genes:** MCHR2 (melanin concentrating hormone receptor 2) [NCBI Gene 84539] {aka GPR145, GPRv17, MCH-2R, MCH-R2, MCH2, MCH2R}
- **Diseases:** SLMT (MESH:D007806), ADAT (MESH:D018489), hearing loss (MESH:D034381)
- **Chemicals:** ADAT (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** G2T
- **Cell lines:** S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232), T — Homo sapiens (Human), Esophageal squamous cell carcinoma, Cancer cell line (CVCL_3174)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12910085/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12910085/full.md

## References

9 references — full list in the complete paper: https://tomesphere.com/paper/PMC12910085/full.md

---
Source: https://tomesphere.com/paper/PMC12910085