ADS-Cap: A Framework for Accurate and Diverse Stylized Captioning with   Unpaired Stylistic Corpora

Kanzhi Cheng; Zheng Ma; Shi Zong; Jianbing Zhang; Xinyu Dai; Jiajun; Chen

arXiv:2308.01143·cs.CV·August 3, 2023

ADS-Cap: A Framework for Accurate and Diverse Stylized Captioning with Unpaired Stylistic Corpora

Kanzhi Cheng, Zheng Ma, Shi Zong, Jianbing Zhang, Xinyu Dai, Jiajun, Chen

PDF

Open Access 1 Repo

TL;DR

ADS-Cap is a novel framework that generates accurate, diverse, and stylistically consistent image captions by combining contrastive learning, a variational auto-encoder, and a recheck module, effectively utilizing unpaired stylistic data.

Contribution

The paper introduces ADS-Cap, a new framework that unifies paired and unpaired stylistic data for improved stylized captioning with enhanced diversity and accuracy.

Findings

01

ADS-Cap outperforms baselines in style accuracy and diversity.

02

Contrastive learning effectively aligns image and text features.

03

The recheck module improves style consistency.

Abstract

Generating visually grounded image captions with specific linguistic styles using unpaired stylistic corpora is a challenging task, especially since we expect stylized captions with a wide variety of stylistic patterns. In this paper, we propose a novel framework to generate Accurate and Diverse Stylized Captions (ADS-Cap). Our ADS-Cap first uses a contrastive learning module to align the image and text features, which unifies paired factual and unpaired stylistic corpora during the training process. A conditional variational auto-encoder is then used to automatically memorize diverse stylistic patterns in latent space and enhance diversity through sampling. We also design a simple but effective recheck module to boost style accuracy by filtering style-specific captions. Experimental results on two widely used stylized image captioning datasets show that regarding consistency with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

njucckevin/ads-cap
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Cancer-related molecular mechanisms research

MethodsContrastive Learning · ALIGN