KENGIC: KEyword-driven and N-Gram Graph based Image Captioning

Brandon Birmingham; Adrian Muscat

arXiv:2302.03729·cs.CV·February 9, 2023·1 cites

KENGIC: KEyword-driven and N-Gram Graph based Image Captioning

Brandon Birmingham, Adrian Muscat

PDF

Open Access

TL;DR

KENGIC is a keyword-driven, N-gram graph-based image captioning method that avoids end-to-end training, offering a more explainable and domain-adaptable alternative to current models with comparable performance.

Contribution

This paper introduces a novel, training-free image captioning approach using N-gram graphs and keywords, enhancing explainability and domain flexibility.

Findings

01

Performance close to state-of-the-art models

02

Effective with both gold standard and detected keywords

03

Provides insights into caption generation and evaluation metrics

Abstract

This paper presents a Keyword-driven and N-gram Graph based approach for Image Captioning (KENGIC). Most current state-of-the-art image caption generators are trained end-to-end on large scale paired image-caption datasets which are very laborious and expensive to collect. Such models are limited in terms of their explainability and their applicability across different domains. To address these limitations, a simple model based on N-Gram graphs which does not require any end-to-end training on paired image captions is proposed. Starting with a set of image keywords considered as nodes, the generator is designed to form a directed graph by connecting these nodes through overlapping n-grams as found in a given text corpus. The model then infers the caption by maximising the most probable n-gram sequences from the constructed graph. To analyse the use and choice of keywords in context of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Topic Modeling