Quantifying the amount of visual information used by neural caption   generators

Marc Tanti; Albert Gatt; Kenneth P. Camilleri

arXiv:1810.05475·cs.NE·February 5, 2019

Quantifying the amount of visual information used by neural caption generators

Marc Tanti, Albert Gatt, Kenneth P. Camilleri

PDF

1 Repo

TL;DR

This paper investigates how neural image caption generators utilize visual information, revealing that their sensitivity varies with word type and caption position, contributing to explainability in AI.

Contribution

It provides a sensitivity and omission analysis of caption generators, highlighting their varying reliance on visual input across different words and caption positions.

Findings

01

Sensitivity to visual input varies by word type

02

Caption generators retain different levels of visual information

03

Analysis advances explainability in neural captioning models

Abstract

This paper addresses the sensitivity of neural image caption generators to their visual input. A sensitivity analysis and omission analysis based on image foils is reported, showing that the extent to which image captioning architectures retain and are sensitive to visual information varies depending on the type of word being generated and the position in the caption as a whole. We motivate this work in the context of broader goals in the field to achieve more explainability in AI.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mtanti/quantifing-visual-information
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.