# Train, Sort, Explain: Learning to Diagnose Translation Models

**Authors:** Robert Schwarzenberg, David Harbecke, Vivien Macketanz, Eleftherios, Avramidis, Sebastian M\"oller

arXiv: 1903.12017 · 2019-03-29

## TL;DR

This paper introduces DiaMaT, a neural classifier-based method to automatically identify and explain systematic differences between human and machine translations, aiding evaluation beyond traditional metrics.

## Contribution

It presents a novel approach using neural explainability to uncover differences between human and machine translations, bridging the gap between automatic metrics and human judgment.

## Key findings

- DiaMaT achieves 75% classification accuracy on Transformer translations.
- The method reveals meaningful systematic differences between human and machine translations.
- Open source implementation available for further research.

## Abstract

Evaluating translation models is a trade-off between effort and detail. On the one end of the spectrum there are automatic count-based methods such as BLEU, on the other end linguistic evaluations by humans, which arguably are more informative but also require a disproportionately high effort. To narrow the spectrum, we propose a general approach on how to automatically expose systematic differences between human and machine translations to human experts. Inspired by adversarial settings, we train a neural text classifier to distinguish human from machine translations. A classifier that performs and generalizes well after training should recognize systematic differences between the two classes, which we uncover with neural explainability methods. Our proof-of-concept implementation, DiaMaT, is open source. Applied to a dataset translated by a state-of-the-art neural Transformer model, DiaMaT achieves a classification accuracy of 75% and exposes meaningful differences between humans and the Transformer, amidst the current discussion about human parity.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.12017/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1903.12017/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/1903.12017/full.md

---
Source: https://tomesphere.com/paper/1903.12017