# Dependency-aware self-attention for robust neural machine translation

**Authors:** Chuncheng Chi, Fuxue Li, Yichen Liu, Peijun Xie, Hong Yan

PMC · DOI: 10.1371/journal.pone.0342772 · PLOS One · 2026-02-12

## TL;DR

This paper introduces a new attention mechanism for machine translation that improves performance by incorporating syntactic dependencies, especially in low-resource settings.

## Contribution

The novel Dependency-Aware Self-Attention mechanism integrates syntactic structures into the Transformer model to enhance translation robustness.

## Key findings

- DASA improves translation performance in low-resource and morphologically rich language settings.
- The method enhances syntactic awareness and robustness by guiding attention to syntactically relevant tokens.
- Experiments show significant gains in translation quality when using the proposed mechanism.

## Abstract

Neural machine translation (NMT) has significantly benefited from integrating various forms of contextual information. However, conventional Transformer-based translation models primarily rely on self-attention mechanisms that are inherently position-invariant, making them inadequate for effectively capturing explicit syntactic dependencies, especially in low-resource scenarios or morphologically rich languages. To address this limitation, we propose a Dependency-Aware Self-Attention (DASA) mechanism that explicitly incorporates syntactic dependency structures into the attention computation. Our method first leverages a dependency parser to derive syntactic trees from source sentences, generating a dependency distance matrix representing pairwise syntactic proximity. This matrix is transformed into a normalized syntactic bias, which is seamlessly integrated into the attention mechanism through element-wise modulation of attention logits. By doing so, DASA guides attention weights towards syntactically relevant tokens, enhancing the Transformer encoder’s structural awareness and representation quality. Experimental results demonstrate that our approach substantially improves the translation performance, particularly in settings with limited training data. Experiments show that DASA enhances syntactic awareness and robustness, especially under data scarcity.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12900430/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12900430/full.md

## References

11 references — full list in the complete paper: https://tomesphere.com/paper/PMC12900430/full.md

---
Source: https://tomesphere.com/paper/PMC12900430