Learning Saliency Prediction From Sparse Fixation Pixel Map
Shanghua Xiao

TL;DR
This paper introduces a novel method for saliency prediction that learns directly from sparse fixation pixel maps, utilizing clustering and max-pooling to improve training, and achieves competitive results on benchmark datasets.
Contribution
It is the first to explore learning saliency from sparse fixation pixel maps and proposes a new loss function tailored for such data.
Findings
Achieves competitive performance on multiple benchmark datasets.
Introduces a clustering-based approach to extract sparse fixation pixels.
Develops a max-pooling transformation to improve training from sparse data.
Abstract
Ground truth for saliency prediction datasets consists of two types of map data: fixation pixel map which records the human eye movements on sample images, and fixation blob map generated by performing gaussian blurring on the corresponding fixation pixel map. Current saliency approaches perform prediction by directly pixel-wise regressing the input image into saliency map with fixation blob as ground truth, yet learning saliency from fixation pixel map is not explored. In this work, we propose a first-of-its-kind approach of learning saliency prediction from sparse fixation pixel map, and a novel loss function for training from such sparse fixation. We utilize clustering to extract sparse fixation pixel from the raw fixation pixel map, and add a max-pooling transformation on the output to avoid false penalty between sparse outputs and labels caused by nearby but non-overlapping…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image Fusion Techniques · Image and Video Quality Assessment
