With a Little Help from your own Past: Prototypical Memory Networks for   Image Captioning

Manuele Barraco; Sara Sarto; Marcella Cornia; Lorenzo Baraldi; Rita; Cucchiara

arXiv:2308.12383·cs.CV·August 25, 2023

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning

Manuele Barraco, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita, Cucchiara

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a prototypical memory network for image captioning that enhances Transformer models by incorporating semantic information from other samples, leading to improved performance on the COCO dataset.

Contribution

The paper proposes a novel prototypical memory mechanism for attention in image captioning, capturing semantic information from multiple samples to boost Transformer performance.

Findings

01

Achieved a 3.7 CIDEr points improvement on COCO dataset

02

Enhanced Transformer-based captioning with sample-aware attention

03

Demonstrated effectiveness of prototype-based memory in vision-language tasks

Abstract

Image captioning, like many tasks involving vision and language, currently relies on Transformer-based architectures for extracting the semantics in an image and translating it into linguistically coherent descriptions. Although successful, the attention operator only considers a weighted summation of projections of the current input sample, therefore ignoring the relevant semantic information which can come from the joint observation of other samples. In this paper, we devise a network which can perform attention over activations obtained while processing other training samples, through a prototypical memory model. Our memory models the distribution of past keys and values through the definition of prototype vectors which are both discriminative and compact. Experimentally, we assess the performance of the proposed model on the COCO dataset, in comparison with carefully designed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aimagelab/pma-net
pytorchOfficial

Videos

With a Little Help from Your Own Past: Prototypical Memory Networks for Image Captioning· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dropout · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Layer Normalization · Dense Connections