Encoding Sentence Position in Context-Aware Neural Machine Translation with Concatenation
Lorenzo Lupo, Marco Dinarelli, Laurent Besacier

TL;DR
This paper explores methods for encoding sentence positions in context-aware neural machine translation, demonstrating benefits in English-Russian translation with specific encoding and training strategies, but not in English-German.
Contribution
It introduces and compares various sentence position encoding methods within Transformer models for context-aware translation, highlighting their effectiveness under certain training conditions.
Findings
Sentence position encoding improves English-Russian translation quality.
Benefits depend on training with a context-discounted loss.
No significant improvement observed in English-German translation.
Abstract
Context-aware translation can be achieved by processing a concatenation of consecutive sentences with the standard Transformer architecture. This paper investigates the intuitive idea of providing the model with explicit information about the position of the sentences contained in the concatenation window. We compare various methods to encode sentence positions into token representations, including novel methods. Our results show that the Transformer benefits from certain sentence position encoding methods on English to Russian translation if trained with a context-discounted loss (Lupo et al., 2022). However, the same benefits are not observed in English to German. Further empirical efforts are necessary to define the conditions under which the proposed approach is beneficial.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Dropout · Layer Normalization · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax
