KENGIC: KEyword-driven and N-Gram Graph based Image Captioning
Brandon Birmingham, Adrian Muscat

TL;DR
KENGIC is a keyword-driven, N-gram graph-based image captioning method that avoids end-to-end training, offering a more explainable and domain-adaptable alternative to current models with comparable performance.
Contribution
This paper introduces a novel, training-free image captioning approach using N-gram graphs and keywords, enhancing explainability and domain flexibility.
Findings
Performance close to state-of-the-art models
Effective with both gold standard and detected keywords
Provides insights into caption generation and evaluation metrics
Abstract
This paper presents a Keyword-driven and N-gram Graph based approach for Image Captioning (KENGIC). Most current state-of-the-art image caption generators are trained end-to-end on large scale paired image-caption datasets which are very laborious and expensive to collect. Such models are limited in terms of their explainability and their applicability across different domains. To address these limitations, a simple model based on N-Gram graphs which does not require any end-to-end training on paired image captions is proposed. Starting with a set of image keywords considered as nodes, the generator is designed to form a directed graph by connecting these nodes through overlapping n-grams as found in a given text corpus. The model then infers the caption by maximising the most probable n-gram sequences from the constructed graph. To analyse the use and choice of keywords in context of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Topic Modeling
