# Fine-grained human evaluation of neural versus phrase-based machine   translation

**Authors:** Filip Klubi\v{c}ka, Antonio Toral, V\'ictor M. S\'anchez-Cartagena

arXiv: 1706.04389 · 2018-02-13

## TL;DR

This paper conducts a detailed manual comparison of three machine translation systems, showing that neural models significantly reduce errors compared to phrase-based systems through fine-grained error annotation.

## Contribution

It introduces a comprehensive error annotation framework for evaluating machine translation systems and provides a direct comparison highlighting neural models' superior performance.

## Key findings

- Neural MT reduces errors by 54% compared to phrase-based MT.
- High inter-annotator agreement validates the annotation method.
- Fine-grained evaluation offers detailed insights into system differences.

## Abstract

We compare three approaches to statistical machine translation (pure phrase-based, factored phrase-based and neural) by performing a fine-grained manual evaluation via error annotation of the systems' outputs. The error types in our annotation are compliant with the multidimensional quality metrics (MQM), and the annotation is performed by two annotators. Inter-annotator agreement is high for such a task, and results show that the best performing system (neural) reduces the errors produced by the worst system (phrase-based) by 54%.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.04389/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1706.04389/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/1706.04389/full.md

---
Source: https://tomesphere.com/paper/1706.04389