DOLFIN -- Document-Level Financial test set for Machine Translation
Mariam Nakhl\'e, Marco Dinarelli, Raheel Qader, Emmanuelle, Esperan\c{c}a-Rodier, Herv\'e Blanchon

TL;DR
DOLFIN is a new, specialized document-level machine translation test set for financial texts, designed to better evaluate models' handling of complex linguistic phenomena and domain-specific challenges.
Contribution
It introduces a novel, domain-specific, section-based test set for document-level MT, addressing limitations of existing sentence-level datasets.
Findings
The test set effectively discriminates between context-sensitive and context-agnostic models.
Models show weaknesses in translating complex financial texts.
The dataset is publicly available for research use.
Abstract
Despite the strong research interest in document-level Machine Translation (MT), the test sets dedicated to this task are still scarce. The existing test sets mainly cover topics from the general domain and fall short on specialised domains, such as legal and financial. Also, in spite of their document-level aspect, they still follow a sentence-level logic that does not allow for including certain linguistic phenomena such as information reorganisation. In this work, we aim to fill this gap by proposing a novel test set: DOLFIN. The dataset is built from specialised financial documents, and it makes a step towards true document-level MT by abandoning the paradigm of perfectly aligned sentences, presenting data in units of sections rather than sentences. The test set consists of an average of 1950 aligned sections for five language pairs. We present a detailed data collection pipeline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsSparse Evolutionary Training
