Style-Aware Contrastive Learning for Multi-Style Image Captioning

Yucheng Zhou; Guodong Long

arXiv:2301.11367·cs.CV·January 30, 2023·5 cites

Style-Aware Contrastive Learning for Multi-Style Image Captioning

Yucheng Zhou, Guodong Long

PDF

Open Access

TL;DR

This paper introduces a style-aware contrastive learning framework for multi-style image captioning, effectively aligning visual content with linguistic style to improve caption quality and style accuracy.

Contribution

It proposes a novel style-aware visual encoder and triplet contrastive objective, along with three retrieval schemes, to better integrate style and content in image captioning.

Findings

01

Achieves state-of-the-art performance on multi-style captioning benchmarks.

02

Effectively distinguishes matched and mismatched image-style-caption triplets.

03

Enhances the relevance of visual content to specified styles.

Abstract

Existing multi-style image captioning methods show promising results in generating a caption with accurate visual content and desired linguistic style. However, existing methods overlook the relationship between linguistic style and visual content. To overcome this drawback, we propose style-aware contrastive learning for multi-style image captioning. First, we present a style-aware visual encoder with contrastive learning to mine potential visual content relevant to style. Moreover, we propose a style-aware triplet contrast objective to distinguish whether the image, style and caption matched. To provide positive and negative samples for contrastive learning, we present three retrieval schemes: object-based retrieval, RoI-based retrieval and triplet-based retrieval, and design a dynamic trade-off function to calculate retrieval scores. Experimental results demonstrate that our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization

MethodsContrastive Learning