Simple Image Description Generator via a Linear Phrase-Based Approach
Remi Lebret, Pedro O. Pinheiro, Ronan Collobert

TL;DR
This paper introduces a simple, bilinear, phrase-based model for image captioning that leverages CNN features and syntax-aware language modeling to generate relevant descriptions with competitive performance.
Contribution
The paper presents a novel, straightforward bilinear approach focusing on syntax, achieving comparable results to complex models on the COCO dataset.
Findings
Achieves competitive captioning results on COCO dataset
Uses a simple bilinear model with CNN features and syntax-based language modeling
Demonstrates effectiveness of a less complex approach for image description
Abstract
Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a sample image. This model has a strong focus on the syntax of the descriptions. We train a purely bilinear model that learns a metric between an image representation (generated from a previously trained Convolutional Neural Network) and phrases that are used to described them. The system is then able to infer phrases from a given image sample. Based on caption syntax statistics, we propose a simple language model that can produce relevant descriptions for a given test image using the phrases inferred. Our approach, which is considerably simpler than state-of-the-art models, achieves comparable results on the recently release Microsoft COCO dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Natural Language Processing Techniques
