DeViL: Decoding Vision features into Language

Meghal Dani; Isabel Rio-Torto; Stephan Alaniz; Zeynep Akata

arXiv:2309.01617·cs.CV·September 6, 2023

DeViL: Decoding Vision features into Language

Meghal Dani, Isabel Rio-Torto, Stephan Alaniz, Zeynep Akata

PDF

Open Access 1 Repo

TL;DR

DeViL decodes vision features into natural language descriptions at different network layers, providing interpretable, localized explanations of vision models using a transformer-based approach that generalizes across vision backbones.

Contribution

Introduces DeViL, a method that translates vision features into language for layer-wise interpretability, leveraging a transformer and pre-trained language model for fast, open-vocabulary explanations.

Findings

01

Outperforms previous captioning models on CC3M.

02

Generates relevant textual descriptions for vision features.

03

Achieves state-of-the-art neuron-wise explanations on MILANNOTATIONS.

Abstract

Post-hoc explanation methods have often been criticised for abstracting away the decision-making process of deep neural networks. In this work, we would like to provide natural language descriptions for what different layers of a vision backbone have learned. Our DeViL method decodes vision features into language, not only highlighting the attribution locations but also generating textual descriptions of visual features at different layers of the network. We train a transformer network to translate individual image features of any vision layer into a prompt that a separate off-the-shelf language model decodes into natural language. By employing dropout both per-layer and per-spatial-location, our model can generalize training on image-text pairs to generate localized explanations. As it uses a pre-trained language model, our approach is fast to train, can be applied to any vision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ExplainableML/DeViL
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques

MethodsDropout