Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation
Junyang Lin, Xu Sun, Xuancheng Ren, Muyu Li, Qi Su

TL;DR
This paper introduces a self-adaptive attention temperature mechanism for neural machine translation, allowing the model to dynamically adjust attention softness for different word types, leading to improved translation quality.
Contribution
The paper proposes a novel Self-Adaptive Control of Temperature (SACT) mechanism that dynamically modulates attention softness in NMT models, enhancing translation accuracy.
Findings
Outperforms baseline models on Chinese-English and English-Vietnamese translation tasks.
Demonstrates better focus on relevant source elements during translation.
Produces higher quality translations with more accurate attention distribution.
Abstract
Most of the Neural Machine Translation (NMT) models are based on the sequence-to-sequence (Seq2Seq) model with an encoder-decoder framework equipped with the attention mechanism. However, the conventional attention mechanism treats the decoding at each time step equally with the same matrix, which is problematic since the softness of the attention for different types of words (e.g. content words and function words) should differ. Therefore, we propose a new model with a mechanism called Self-Adaptive Control of Temperature (SACT) to control the softness of attention by means of an attention temperature. Experimental results on the Chinese-English translation and English-Vietnamese translation demonstrate that our model outperforms the baseline models, and the analysis and the case study show that our model can attend to the most relevant elements in the source-side contexts and generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
