A comparison of translation performance between DeepL and Supertext
Alex Fl\"uckiger, Chantal Amrhein, Tim Graf, Fr\'ed\'eric Odermatt, Martin P\"omsl, Philippe Schl\"apfer, Florian Schottmann, Samuel L\"aubli

TL;DR
This study compares DeepL and Supertext machine translation systems using document-level evaluation, revealing Supertext's superior consistency in longer texts and emphasizing the need for context-aware benchmarking methods.
Contribution
It introduces a document-level evaluation approach for MT systems and provides a comparative analysis of DeepL and Supertext across multiple languages.
Findings
Supertext outperforms DeepL in three of four language directions at document level.
Segment-level assessments show no strong preference between the systems.
Highlights the importance of context-sensitive evaluation for MT quality.
Abstract
As strong machine translation (MT) systems are increasingly based on large language models (LLMs), reliable quality benchmarking requires methods that capture their ability to leverage extended context. This study compares two commercial MT systems -- DeepL and Supertext -- by assessing their performance on unsegmented texts. We evaluate translation quality across four language directions with professional translators assessing segments with full document-level context. While segment-level assessments indicate no strong preference between the systems in most cases, document-level analysis reveals a preference for Supertext in three out of four language directions, suggesting superior consistency across longer texts. We advocate for more context-sensitive evaluation methodologies to ensure that MT quality assessments reflect real-world usability. We release all evaluation data and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
