Efficient Modeling of Future Context for Image Captioning

Zhengcong Fei; Junshi Huang; Xiaoming Wei; Xiaolin Wei

arXiv:2207.10897·cs.CV·October 19, 2022·1 cites

Efficient Modeling of Future Context for Image Captioning

Zhengcong Fei, Junshi Huang, Xiaoming Wei, Xiaolin Wei

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel method to incorporate future context into autoregressive image captioning models by leveraging ideas from non-autoregressive models, resulting in improved captioning performance without additional inference cost.

Contribution

It proposes a training framework that enables autoregressive models to utilize future context effectively, combining shared visual encoders and a teacher-student paradigm for enhanced captioning.

Findings

01

Outperforms state-of-the-art baselines on MS COCO

02

Improves automatic metrics and human evaluation scores

03

Maintains inference efficiency without extra time cost

Abstract

Existing approaches to image captioning usually generate the sentence word-by-word from left to right, with the constraint of conditioned on local context including the given image and history generated words. There have been many studies target to make use of global information during decoding, e.g., iterative refinement. However, it is still under-explored how to effectively and efficiently incorporate the future context. To respond to this issue, inspired by that Non-Autoregressive Image Captioning (NAIC) can leverage two-side relation with modified mask operation, we aim to graft this advance to the conventional Autoregressive Image Captioning (AIC) model while maintaining the inference efficiency without extra time cost. Specifically, AIC and NAIC models are first trained combined with shared visual encoders, forcing the visual encoder to contain sufficient and valid future…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

feizc/future-caption
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning