Uncertainty-Aware Image Captioning

Zhengcong Fei; Mingyuan Fan; Li Zhu; Junshi Huang; Xiaoming Wei,; Xiaolin Wei

arXiv:2211.16769·cs.CV·December 1, 2022

Uncertainty-Aware Image Captioning

Zhengcong Fei, Mingyuan Fan, Li Zhu, Junshi Huang, Xiaoming Wei,, Xiaolin Wei

PDF

Open Access

TL;DR

This paper introduces an uncertainty-aware image captioning framework that iteratively inserts words based on their uncertainty levels, improving caption quality and decoding speed through a non-autoregressive hierarchy.

Contribution

It proposes a novel parallel and iterative captioning method that accounts for word uncertainty, enhancing explainability and efficiency over traditional sequential models.

Findings

01

Outperforms baseline in caption quality on MS COCO

02

Achieves faster decoding with logarithmic time complexity

03

Provides more explainable caption generation process

Abstract

It is well believed that the higher uncertainty in a word of the caption, the more inter-correlated context information is required to determine it. However, current image captioning methods usually consider the generation of all words in a sentence sequentially and equally. In this paper, we propose an uncertainty-aware image captioning framework, which parallelly and iteratively operates insertion of discontinuous candidate words between existing words from easy to difficult until converged. We hypothesize that high-uncertainty words in a sentence need more prior information to make a correct decision and should be produced at a later stage. The resulting non-autoregressive hierarchy makes the caption generation explainable and intuitive. Specifically, we utilize an image-conditioned bag-of-word model to measure the word uncertainty and apply a dynamic programming algorithm to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization