Unified Attention Modeling for Efficient Free-Viewing and Visual Search via Shared Representations

Fatma Youssef Mohammed; Kostas Alexis

arXiv:2506.02764·cs.CV·June 4, 2025

Unified Attention Modeling for Efficient Free-Viewing and Visual Search via Shared Representations

Fatma Youssef Mohammed, Kostas Alexis

PDF

Open Access

TL;DR

This paper proposes a shared neural network architecture that models human attention for both free-viewing and visual search, enabling efficient transfer of learned representations with minimal performance loss and significant computational savings.

Contribution

It introduces a unified attention model based on HAT that demonstrates shared representations between free-viewing and visual search tasks, reducing training costs and maintaining high accuracy.

Findings

01

Shared representations enable transfer with only 3.86% performance drop.

02

Model reduces computational costs by over 92% in GFLOPs.

03

Transferability maintains high similarity to human scanpaths.

Abstract

Computational human attention modeling in free-viewing and task-specific settings is often studied separately, with limited exploration of whether a common representation exists between them. This work investigates this question and proposes a neural network architecture that builds upon the Human Attention transformer (HAT) to test the hypothesis. Our results demonstrate that free-viewing and visual search can efficiently share a common representation, allowing a model trained in free-viewing attention to transfer its knowledge to task-driven visual search with a performance drop of only 3.86% in the predicted fixation scanpaths, measured by the semantic sequence score (SemSS) metric which reflects the similarity between predicted and human scanpaths. This transfer reduces computational costs by 92.29% in terms of GFLOPs and 31.23% in terms of trainable parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection · Image Retrieval and Classification Techniques