Generating Image Descriptions via Sequential Cross-Modal Alignment   Guided by Human Gaze

Ece Takmaz; Sandro Pezzelle; Lisa Beinborn; Raquel Fern\'andez

arXiv:2011.04592·cs.CL·November 10, 2020

Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze

Ece Takmaz, Sandro Pezzelle, Lisa Beinborn, Raquel Fern\'andez

PDF

1 Repo

TL;DR

This paper introduces a sequential cross-modal alignment model for image captioning that leverages human gaze data to produce more natural and speaker-aligned descriptions, highlighting the importance of temporal gaze information in visual language tasks.

Contribution

It presents the first sequential gaze-driven image captioning model, demonstrating improved description quality by integrating gaze data with a recurrent attention mechanism.

Findings

01

Gaze-driven models produce more natural and diverse descriptions.

02

Sequential processing of gaze data enhances alignment with human descriptions.

03

Gaze encoding with recurrent components improves caption quality.

Abstract

When speakers describe an image, they tend to look at objects before mentioning them. In this paper, we investigate such sequential cross-modal alignment by modelling the image description generation process computationally. We take as our starting point a state-of-the-art image captioning system and develop several model variants that exploit information from human gaze patterns recorded during language production. In particular, we propose the first approach to image description generation where visual processing is modelled $sequentially$ . Our experiments and analyses confirm that better descriptions can be obtained by exploiting gaze-driven attention and shed light on human cognitive processes by comparing different ways of aligning the gaze modality with language production. We find that processing gaze data sequentially leads to descriptions that are better aligned to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dmg-illc/didec-seq-gen
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.