CUNI Non-Autoregressive System for the WMT 22 Efficient Translation   Shared Task

Jind\v{r}ich Helcl

arXiv:2212.00477·cs.CL·December 2, 2022

CUNI Non-Autoregressive System for the WMT 22 Efficient Translation Shared Task

Jind\v{r}ich Helcl

PDF

Open Access

TL;DR

This paper introduces a non-autoregressive translation system for WMT 22, focusing on establishing reliable baselines and evaluation methods, especially for decoding speed, using a 12-layer Transformer trained with CTC on distilled data.

Contribution

It presents a non-autoregressive translation model with a robust evaluation methodology, facilitating fair comparison with autoregressive models.

Findings

01

Provides a solid baseline for non-autoregressive translation

02

Highlights the importance of evaluation metrics for decoding speed

03

Uses knowledge distillation to improve model performance

Abstract

We present a non-autoregressive system submission to the WMT 22 Efficient Translation Shared Task. Our system was used by Helcl et al. (2022) in an attempt to provide fair comparison between non-autoregressive and autoregressive models. This submission is an effort to establish solid baselines along with sound evaluation methodology, particularly in terms of measuring the decoding speed. The model itself is a 12-layer Transformer model trained with connectionist temporal classification on knowledge-distilled dataset by a strong autoregressive teacher model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · Adam · Layer Normalization · Absolute Position Encodings · Softmax · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing