Semantics-aware Attention Improves Neural Machine Translation
Aviv Slobodkin, Leshem Choshen, Omri Abend

TL;DR
This paper introduces two novel, parameter-free methods to incorporate semantic structures into Transformer models for neural machine translation, leading to consistent improvements across multiple language pairs.
Contribution
It proposes semantics-aware masking techniques for Transformer attention heads, integrating semantic information without additional parameters, and demonstrates their effectiveness in translation tasks.
Findings
Consistent improvement over vanilla Transformer models.
Additional gains when combining semantic and syntactic structures.
Effective across four language pairs.
Abstract
The integration of syntactic structures into Transformer machine translation has shown positive results, but to our knowledge, no work has attempted to do so with semantic structures. In this work we propose two novel parameter-free methods for injecting semantic information into Transformers, both rely on semantics-aware masking of (some of) the attention heads. One such method operates on the encoder, through a Scene-Aware Self-Attention (SASA) head. Another on the decoder, through a Scene-Aware Cross-Attention (SACrA) head. We show a consistent improvement over the vanilla Transformer and syntax-aware models for four language pairs. We further show an additional gain when using both semantic and syntactic structures in some language pairs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Label Smoothing · Byte Pair Encoding · Dense Connections · Absolute Position Encodings · Softmax · Dropout
