Improving Captioning for Low-Resource Languages by Cycle Consistency

Yike Wu; Shiwan Zhao; Jia Chen; Ying Zhang; Xiaojie Yuan; Zhong Su

arXiv:1908.07810·cs.CL·August 22, 2019·1 cites

Improving Captioning for Low-Resource Languages by Cycle Consistency

Yike Wu, Shiwan Zhao, Jia Chen, Ying Zhang, Xiaojie Yuan, Zhong Su

PDF

Open Access

TL;DR

This paper introduces a unified model leveraging cycle consistency to improve captioning in low-resource languages by combining translation and alignment strategies, utilizing English datasets to enhance accuracy and alignment quality.

Contribution

The paper proposes a novel architecture that integrates translation and alignment approaches with cycle consistency, enabling effective use of large English caption datasets for low-resource language captioning.

Findings

01

Outperforms state-of-the-art methods on standard metrics

02

Improves fine-grained word-region alignment

03

Effectively leverages monolingual English datasets

Abstract

Improving the captioning performance on low-resource languages by leveraging English caption datasets has received increasing research interest in recent years. Existing works mainly fall into two categories: translation-based and alignment-based approaches. In this paper, we propose to combine the merits of both approaches in one unified architecture. Specifically, we use a pre-trained English caption model to generate high-quality English captions, and then take both the image and generated English captions to generate low-resource language captions. We improve the captioning performance by adding the cycle consistency constraint on the cycle of image regions, English words, and low-resource language words. Moreover, our architecture has a flexible design which enables it to benefit from large monolingual English caption datasets. Experimental results demonstrate that our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition