Unify word-level and span-level tasks: NJUNLP's Participation for the   WMT2023 Quality Estimation Shared Task

Xiang Geng; Zhejian Lai; Yu Zhang; Shimin Tao; Hao Yang; Jiajun Chen,; Shujian Huang

arXiv:2309.13230·cs.CL·December 12, 2023

Unify word-level and span-level tasks: NJUNLP's Participation for the WMT2023 Quality Estimation Shared Task

Xiang Geng, Zhejian Lai, Yu Zhang, Shimin Tao, Hao Yang, Jiajun Chen,, Shujian Huang

PDF

Open Access 1 Repo

TL;DR

This paper presents NJUNLP's approach to the WMT2023 QE shared task, combining pseudo data generation, pre-training, and fine-tuning of models for improved quality estimation at sentence, word, and span levels.

Contribution

The paper introduces a unified framework that leverages pseudo data and joint learning for sentence, word, and span quality estimation tasks.

Findings

01

Achieved top results in English-German QE sub-tasks.

02

Effective use of pseudo MQM data for pre-training.

03

Proposed a simple method to convert word-level outputs to span-level results.

Abstract

We introduce the submissions of the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task. Our team submitted predictions for the English-German language pair on all two sub-tasks: (i) sentence- and word-level quality prediction; and (ii) fine-grained error span detection. This year, we further explore pseudo data methods for QE based on NJUQE framework (https://github.com/NJUNLP/njuqe). We generate pseudo MQM data using parallel data from the WMT translation task. We pre-train the XLMR large model on pseudo QE data, then fine-tune it on real QE data. At both stages, we jointly learn sentence-level scores and word-level tags. Empirically, we conduct experiments to find the key hyper-parameters that improve the performance. Technically, we propose a simple method that covert the word-level outputs to fine-grained error span results. Overall, our models achieved the best results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

njunlp/njuqe
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques