CUNI System for the WMT17 Multimodal Translation Task
Jind\v{r}ich Helcl, Jind\v{r}ich Libovick\'y

TL;DR
This paper details CUNI's submissions to the WMT17 Multimodal Translation Task, focusing on neural translation and cross-lingual captioning, utilizing data augmentation and translation pipelines.
Contribution
It introduces a purely textual neural translation system enhanced with data synthesis and demonstrates a pipeline for cross-lingual image captioning.
Findings
Data augmentation improved translation quality
Back-translation contributed to better model performance
Negative results highlight potential directions for future research
Abstract
In this paper, we describe our submissions to the WMT17 Multimodal Translation Task. For Task 1 (multimodal translation), our best scoring system is a purely textual neural translation of the source image caption to the target language. The main feature of the system is the use of additional data that was acquired by selecting similar sentences from parallel corpora and by data synthesis with back-translation. For Task 2 (cross-lingual image captioning), our best submitted system generates an English caption which is then translated by the best system used in Task 1. We also present negative results, which are based on ideas that we believe have potential of making improvements, but did not prove to be useful in our particular setup.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
