Diverse and Styled Image Captioning Using SVD-Based Mixture of Recurrent   Experts

Marzieh Heidari; Mehdi Ghatee; Ahmad Nickabadi; Arash Pourhasan Nezhad

arXiv:2007.03338·cs.CV·February 3, 2022·1 cites

Diverse and Styled Image Captioning Using SVD-Based Mixture of Recurrent Experts

Marzieh Heidari, Mehdi Ghatee, Ahmad Nickabadi, Arash Pourhasan Nezhad

PDF

Open Access 1 Repo

TL;DR

This paper introduces MoRE, a novel image captioning model that uses SVD-based mixture of recurrent experts to generate diverse, styled captions without needing styled datasets, validated on the Microsoft COCO dataset.

Contribution

The paper presents a new captioning approach combining SVD with RNNs to enhance diversity and style without additional labeled data.

Findings

01

Generates diverse, styled captions without styled datasets

02

Achieves improved content accuracy in captions

03

Validated on Microsoft COCO dataset

Abstract

With great advances in vision and natural language processing, the generation of image captions becomes a need. In a recent paper, Mathews, Xie and He [1], extended a new model to generate styled captions by separating semantics and style. In continuation of this work, here a new captioning model is developed including an image encoder to extract the features, a mixture of recurrent networks to embed the set of extracted features to a set of words, and a sentence generator that combines the obtained words as a stylized sentence. The resulted system that entitled as Mixture of Recurrent Experts (MoRE), uses a new training algorithm that derives singular value decomposition (SVD) from weighting matrices of Recurrent Neural Networks (RNNs) to increase the diversity of captions. Each decomposition step depends on a distinctive factor based on the number of RNNs in MoRE. Since the used…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

annabeth-h/styled-and-diverse-image-captioning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization