Hierarchical Representations for Spatio-Temporal Visual Attention   Modeling and Understanding

Miguel-\'Angel Fern\'andez-Torres

arXiv:2308.05189·cs.CV·August 11, 2023

Hierarchical Representations for Spatio-Temporal Visual Attention Modeling and Understanding

Miguel-\'Angel Fern\'andez-Torres

PDF

Open Access

TL;DR

This thesis develops hierarchical models for spatio-temporal visual attention in videos, introducing a probabilistic model and a deep network architecture to improve understanding and estimation of attention over time.

Contribution

It proposes two novel computational models—one probabilistic and one deep learning-based—for context-aware and top-down spatio-temporal visual attention modeling.

Findings

01

Probabilistic model effectively captures context-aware attention.

02

Deep network accurately estimates top-down attention in videos.

03

Models enhance understanding of visual attention dynamics.

Abstract

This PhD. Thesis concerns the study and development of hierarchical representations for spatio-temporal visual attention modeling and understanding in video sequences. More specifically, we propose two computational models for visual attention. First, we present a generative probabilistic model for context-aware visual attention modeling and understanding. Secondly, we develop a deep network architecture for visual attention modeling, which first estimates top-down spatio-temporal visual attention, and ultimately serves for modeling attention in the temporal domain.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications