SemStyle: Learning to Generate Stylised Image Captions using Unaligned   Text

Alexander Mathews; Lexing Xie; Xuming He

arXiv:1805.07030·cs.CV·May 21, 2018·19 cites

SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text

Alexander Mathews, Lexing Xie, Xuming He

PDF

Open Access 1 Repo

TL;DR

SemStyle is a novel model that generates visually relevant, styled image captions from unaligned text by separating semantics and style, leveraging large corpora of styled language.

Contribution

It introduces a new approach to stylized caption generation that does not require aligned image-caption pairs, using a semantic representation and a unified language model.

Findings

01

Captions preserve image semantics and are style shifted.

02

Automatic and manual evaluations confirm relevance and style adaptation.

03

Model leverages large unaligned styled text corpora.

Abstract

Linguistic style is an essential part of written communication, with the power to affect both clarity and attractiveness. With recent advances in vision and language, we can start to tackle the problem of generating image captions that are both visually grounded and appropriately styled. Existing approaches either require styled training captions aligned to images or generate captions with low relevance. We develop a model that learns to generate visually relevant styled captions from a large corpus of styled text without aligned images. The core idea of this model, called SemStyle, is to separate semantics and style. One key component is a novel and concise semantic term representation generated using natural language processing techniques and frame semantics. In addition, we develop a unified language model that decodes sentences with diverse word choices and syntax for different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

computationalmedia/semstyle
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Language, Metaphor, and Cognition