PETra: A Multilingual Corpus of Pragmatic Explicitation in Translation
Doreen Osmelak, Koel Dutta Chowdhury, Uliana Sentsova, Cristina Espa\~na-Bonet, Josef van Genabith

TL;DR
This paper introduces PragExTra, a multilingual corpus and detection framework for pragmatic explicitation in translation, enabling computational analysis of cultural and contextual enrichments across eight language pairs.
Contribution
It presents the first multilingual corpus and detection method for pragmatic explicitation, demonstrating improved classifier accuracy with active learning across languages.
Findings
Entity and system-level explicitation are most frequent.
Active learning improves classifier accuracy by 7-8 percentage points.
Achieves up to 0.88 accuracy and 0.82 F1 across languages.
Abstract
Translators often enrich texts with background details that make implicit cultural meanings explicit for new audiences. This phenomenon, known as pragmatic explicitation, has been widely discussed in translation theory but rarely modeled computationally. We introduce PragExTra, the first multilingual corpus and detection framework for pragmatic explicitation. The corpus covers eight language pairs from TED-Multi and Europarl and includes additions such as entity descriptions, measurement conversions, and translator remarks. We identify candidate explicitation cases through null alignments and refined using active learning with human annotation. Our results show that entity and system-level explicitations are most frequent, and that active learning improves classifier accuracy by 7-8 percentage points, achieving up to 0.88 accuracy and 0.82 F1 across languages. PragExTra establishes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
