Revisiting Salient Object Detection from an Observer-Centric Perspective
Fuxi Zhang, Yifan Wang, Hengrun Zhao, Zhuohan Sun, Changxing Xia, Lijun Wang, Huchuan Lu, Yangrui Shao, Chen Yang, Long Teng

TL;DR
This paper introduces an observer-centric approach to salient object detection, incorporating observer-specific factors to better model human perception and creating a new dataset and baseline for personalized saliency prediction.
Contribution
It proposes a novel observer-centric formulation for salient object detection, develops a large dataset with textual prompts, and designs an agentic baseline model to capture perception diversity.
Findings
The OC-SOD dataset contains 33k images with textual prompts and object pairs.
The OC-SODAgent baseline effectively models personalized saliency prediction.
Experiments show improved alignment with human perception.
Abstract
Salient object detection is inherently a subjective problem, as observers with different priors may perceive different objects as salient. However, existing methods predominantly formulate it as an objective prediction task with a single groundtruth segmentation map for each image, which renders the problem under-determined and fundamentally ill-posed. To address this issue, we propose Observer-Centric Salient Object Detection (OC-SOD), where salient regions are predicted by considering not only the visual cues but also the observer-specific factors such as their preferences or intents. As a result, this formulation captures the intrinsic ambiguity and diversity of human perception, enabling personalized and context-aware saliency prediction. By leveraging multi-modal large language models, we develop an efficient data annotation pipeline and construct the first OC-SOD dataset named…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Face Recognition and Perception
