Phrase-based Image Captioning
R\'emi Lebret, Pedro O. Pinheiro, Ronan Collobert

TL;DR
This paper introduces a simple, bilinear model for image captioning that leverages phrase inference and syntax modeling to generate relevant descriptions, achieving competitive results on standard datasets.
Contribution
The paper presents a novel, straightforward bilinear approach that focuses on syntax and phrase inference for image captioning, differing from more complex state-of-the-art models.
Findings
Achieves comparable results to state-of-the-art models on Flickr30k and MS COCO datasets.
Uses a purely bilinear model trained on image and phrase representations.
Incorporates a syntax-based language model for caption generation.
Abstract
Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a sample image. This model has a strong focus on the syntax of the descriptions. We train a purely bilinear model that learns a metric between an image representation (generated from a previously trained Convolutional Neural Network) and phrases that are used to described them. The system is then able to infer phrases from a given image sample. Based on caption syntax statistics, we propose a simple language model that can produce relevant descriptions for a given test image using the phrases inferred. Our approach, which is considerably simpler than state-of-the-art models, achieves comparable results in two popular datasets for the task: Flickr30k and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
