Style-Aware Contrastive Learning for Multi-Style Image Captioning
Yucheng Zhou, Guodong Long

TL;DR
This paper introduces a style-aware contrastive learning framework for multi-style image captioning, effectively aligning visual content with linguistic style to improve caption quality and style accuracy.
Contribution
It proposes a novel style-aware visual encoder and triplet contrastive objective, along with three retrieval schemes, to better integrate style and content in image captioning.
Findings
Achieves state-of-the-art performance on multi-style captioning benchmarks.
Effectively distinguishes matched and mismatched image-style-caption triplets.
Enhances the relevance of visual content to specified styles.
Abstract
Existing multi-style image captioning methods show promising results in generating a caption with accurate visual content and desired linguistic style. However, existing methods overlook the relationship between linguistic style and visual content. To overcome this drawback, we propose style-aware contrastive learning for multi-style image captioning. First, we present a style-aware visual encoder with contrastive learning to mine potential visual content relevant to style. Moreover, we propose a style-aware triplet contrast objective to distinguish whether the image, style and caption matched. To provide positive and negative samples for contrastive learning, we present three retrieval schemes: object-based retrieval, RoI-based retrieval and triplet-based retrieval, and design a dynamic trade-off function to calculate retrieval scores. Experimental results demonstrate that our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
MethodsContrastive Learning
