Global and Local Sensitivity Guided Key Salient Object Re-augmentation for Video Saliency Detection
Ziqi Zhou, Zheng Wang, Huchuan Lu, Song Wang, Meijun Sun

TL;DR
This paper introduces KSORA, a novel video saliency detection method that combines local feature selection and global object ranking to enhance key salient object detection in dynamic scenes, outperforming existing methods.
Contribution
The paper proposes KSORA, a new approach integrating top-down and bottom-up strategies for improved key salient object detection in videos, addressing limitations of previous static feature weighting methods.
Findings
KSORA achieves higher detection accuracy on benchmark datasets.
The method runs at 17FPS on modern GPUs, demonstrating real-time capability.
Outperforms ten state-of-the-art algorithms in complex scene detection.
Abstract
The existing still-static deep learning based saliency researches do not consider the weighting and highlighting of extracted features from different layers, all features contribute equally to the final saliency decision-making. Such methods always evenly detect all "potentially significant regions" and unable to highlight the key salient object, resulting in detection failure of dynamic scenes. In this paper, based on the fact that salient areas in videos are relatively small and concentrated, we propose a \textbf{key salient object re-augmentation method (KSORA) using top-down semantic knowledge and bottom-up feature guidance} to improve detection accuracy in video scenes. KSORA includes two sub-modules (WFE and KOS): WFE processes local salient feature selection using bottom-up strategy, while KOS ranks each object in global fashion by top-down statistical knowledge, and chooses the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
