Weakly Supervised Video Salient Object Detection via Point Supervision
Shuyong Gao, Haozhe Xing, Wei Zhang, Yan Wang, Qianyu Guo, Wenqiang, Zhang

TL;DR
This paper introduces a novel point-supervised approach for video salient object detection, leveraging inter-frame information and attention modules to achieve high performance with minimal annotation effort.
Contribution
It proposes a new point supervision method with hybrid token attention and long-term cross-frame attention modules for effective video saliency detection.
Findings
Outperforms previous weakly supervised methods
Comparable to some fully supervised approaches
Introduces two new point-supervised datasets
Abstract
Video salient object detection models trained on pixel-wise dense annotation have achieved excellent performance, yet obtaining pixel-by-pixel annotated datasets is laborious. Several works attempt to use scribble annotations to mitigate this problem, but point supervision as a more labor-saving annotation method (even the most labor-saving method among manual annotation methods for dense prediction), has not been explored. In this paper, we propose a strong baseline model based on point supervision. To infer saliency maps with temporal information, we mine inter-frame complementary information from short-term and long-term perspectives, respectively. Specifically, we propose a hybrid token attention module, which mixes optical flow and image information from orthogonal directions, adaptively highlighting critical optical flow information (channel dimension) and critical token…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Face Recognition and Perception
