From Sight to Insight: Unleashing Eye-Tracking in Weakly Supervised Video Salient Object Detection

Qi Qin; Runmin Cong; Gen Zhan; Yiting Liao; and Sam Kwong

arXiv:2506.23519·cs.CV·July 1, 2025

From Sight to Insight: Unleashing Eye-Tracking in Weakly Supervised Video Salient Object Detection

Qi Qin, Runmin Cong, Gen Zhan, Yiting Liao, and Sam Kwong

PDF

Open Access

TL;DR

This paper introduces a weakly supervised video salient object detection method leveraging eye-tracking fixation data, employing novel modules for feature guidance and contrastive learning, achieving superior results on benchmark datasets.

Contribution

It proposes a new weakly supervised framework that integrates fixation data with position and semantic embedding, and introduces contrastive learning modules for improved spatiotemporal modeling.

Findings

01

Outperforms existing methods on five benchmark datasets

02

Effective utilization of eye-tracking data enhances detection accuracy

03

Novel modules improve feature learning and contrastive modeling

Abstract

The eye-tracking video saliency prediction (VSP) task and video salient object detection (VSOD) task both focus on the most attractive objects in video and show the result in the form of predictive heatmaps and pixel-level saliency masks, respectively. In practical applications, eye tracker annotations are more readily obtainable and align closely with the authentic visual patterns of human eyes. Therefore, this paper aims to introduce fixation information to assist the detection of video salient objects under weak supervision. On the one hand, we ponder how to better explore and utilize the information provided by fixation, and then propose a Position and Semantic Embedding (PSE) module to provide location and semantic guidance during the feature learning process. On the other hand, we achieve spatiotemporal feature modeling under weak supervision from the aspects of feature selection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Gaze Tracking and Assistive Technology · Image and Video Quality Assessment