Hierarchical Representations for Spatio-Temporal Visual Attention Modeling and Understanding
Miguel-\'Angel Fern\'andez-Torres

TL;DR
This thesis develops hierarchical models for spatio-temporal visual attention in videos, introducing a probabilistic model and a deep network architecture to improve understanding and estimation of attention over time.
Contribution
It proposes two novel computational models—one probabilistic and one deep learning-based—for context-aware and top-down spatio-temporal visual attention modeling.
Findings
Probabilistic model effectively captures context-aware attention.
Deep network accurately estimates top-down attention in videos.
Models enhance understanding of visual attention dynamics.
Abstract
This PhD. Thesis concerns the study and development of hierarchical representations for spatio-temporal visual attention modeling and understanding in video sequences. More specifically, we propose two computational models for visual attention. First, we present a generative probabilistic model for context-aware visual attention modeling and understanding. Secondly, we develop a deep network architecture for visual attention modeling, which first estimates top-down spatio-temporal visual attention, and ultimately serves for modeling attention in the temporal domain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
