The Eval4NLP Shared Task on Explainable Quality Estimation: Overview and Results
Marina Fomicheva, Piyawat Lertvittayakumjorn, Wei Zhao, Steffen Eger,, Yang Gao

TL;DR
This paper introduces the Eval4NLP-2021 shared task focused on explainable quality estimation for translations, emphasizing both scoring and interpretability, and presents data, participating systems, and analysis of results.
Contribution
It is the first shared task on explainable NLP evaluation metrics, providing a benchmark for explainable quality estimation in translation.
Findings
Six systems participated in the shared task
Analysis of system performances and approaches
Datasets and evaluation results are publicly available
Abstract
In this paper, we introduce the Eval4NLP-2021shared task on explainable quality estimation. Given a source-translation pair, this shared task requires not only to provide a sentence-level score indicating the overall quality of the translation, but also to explain this score by identifying the words that negatively impact translation quality. We present the data, annotation guidelines and evaluation setup of the shared task, describe the six participating systems, and analyze the results. To the best of our knowledge, this is the first shared task on explainable NLP evaluation metrics. Datasets and results are available at https://github.com/eval4nlp/SharedTask2021.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques
