TL;DR
This paper introduces a fluency-guided learning framework for cross-lingual image captioning that improves caption fluency and relevance in Chinese using only machine-translated sentences, without manual target language data.
Contribution
It presents a novel fluency-guided training method that enhances cross-lingual captioning models trained solely on machine-translated data.
Findings
Improves caption fluency in Chinese
Enhances relevance of generated captions
Does not require manually written target language sentences
Abstract
Image captioning has so far been explored mostly in English, as most available datasets are in this language. However, the application of image captioning should not be restricted by language. Only few studies have been conducted for image captioning in a cross-lingual setting. Different from these works that manually build a dataset for a target language, we aim to learn a cross-lingual captioning model fully from machine-translated sentences. To conquer the lack of fluency in the translated sentences, we propose in this paper a fluency-guided learning framework. The framework comprises a module to automatically estimate the fluency of the sentences and another module to utilize the estimated fluency scores to effectively train an image captioning model for the target language. As experiments on two bilingual (English-Chinese) datasets show, our approach improves both fluency and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
