Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang

TL;DR
This paper introduces a semi-supervised image captioning method that leverages scene graphs and Wasserstein distance to improve caption quality using limited labeled data and abundant unlabeled images.
Contribution
The novel SSIC-WGM approach uses scene graphs and Wasserstein graph matching for semi-supervised learning in image captioning, addressing cross-modal heterogeneity.
Findings
Improves captioning accuracy with limited labeled data.
Effectively utilizes unlabeled images through graph-based consistency constraints.
Demonstrates superior performance over existing semi-supervised methods.
Abstract
Image captioning can automatically generate captions for the given images, and the key challenge is to learn a mapping function from visual features to natural language features. Existing approaches are mostly supervised ones, i.e., each image has a corresponding sentence in the training set. However, considering that describing images always requires a huge of manpower, we usually have limited amount of described images (i.e., image-text pairs) and a large number of undescribed images in real-world applications. Thereby, a dilemma is the "Semi-Supervised Image Captioning". To solve this problem, we propose a novel Semi-Supervised Image Captioning method considering Wasserstein Graph Matching (SSIC-WGM), which turns to adopt the raw image inputs to supervise the generated sentences. Different from traditional single modal semi-supervised methods, the difficulty of semi-supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection
