An Explainable Approach to Document-level Translation Evaluation with Topic Modeling
Hyeokmin Lee, Youngkyu Kim, Byounghyun Yoo

TL;DR
This paper introduces a novel reference-free document-level translation evaluation method using topic modeling to assess thematic consistency across source and translated texts.
Contribution
It proposes a new evaluation framework leveraging topic modeling techniques like LSA, LDA, and BERTopic to measure thematic preservation without references.
Findings
The framework effectively captures thematic integrity in translations.
It outperforms existing metrics in evaluating document-level translation quality.
Visualization of key tokens provides intuitive insights into translation fidelity.
Abstract
The advent of NMT has expanded the scope of translation beyond isolated sentences, enabling context to be preserved across paragraphs and documents. However, current evaluation metrics largely remain restricted to the sentence level and typically depend on reference translations. Without references, existing metrics cannot provide a clear basis for their quality assessments. To address these limitations, we propose an evaluation framework that independently extracts and compares latent topic structures within source and translated texts. This framework utilises various topic modelling techniques, including LSA, LDA and BERTopic, to achieve this. Our methodology captures statistical frequency information and semantic context, providing a comprehensive evaluation of the entire document. It aligns key topic tokens across languages using a bilingual dictionary and quantifies thematic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
