CUNI Non-Autoregressive System for the WMT 22 Efficient Translation Shared Task
Jind\v{r}ich Helcl

TL;DR
This paper introduces a non-autoregressive translation system for WMT 22, focusing on establishing reliable baselines and evaluation methods, especially for decoding speed, using a 12-layer Transformer trained with CTC on distilled data.
Contribution
It presents a non-autoregressive translation model with a robust evaluation methodology, facilitating fair comparison with autoregressive models.
Findings
Provides a solid baseline for non-autoregressive translation
Highlights the importance of evaluation metrics for decoding speed
Uses knowledge distillation to improve model performance
Abstract
We present a non-autoregressive system submission to the WMT 22 Efficient Translation Shared Task. Our system was used by Helcl et al. (2022) in an attempt to provide fair comparison between non-autoregressive and autoregressive models. This submission is an effort to establish solid baselines along with sound evaluation methodology, particularly in terms of measuring the decoding speed. The model itself is a 12-layer Transformer model trained with connectionist temporal classification on knowledge-distilled dataset by a strong autoregressive teacher model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Adam · Layer Normalization · Absolute Position Encodings · Softmax · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing
