TL;DR
This paper introduces a novel spatiotemporal network for real-time video saliency detection that enables efficient mutual enhancement of spatial and temporal branches through interactive modeling, achieving high accuracy at 50 FPS.
Contribution
The paper proposes an interactive spatiotemporal network integrating a lightweight temporal model with a recurrent spatial branch for improved real-time video saliency detection.
Findings
Achieves 50 FPS in real-time video saliency detection.
Enhances saliency accuracy through mutual spatial-temporal interaction.
Effective in locating salient regions with salient movements.
Abstract
The current main stream methods formulate their video saliency mainly from two independent venues, i.e., the spatial and temporal branches. As a complementary component, the main task for the temporal branch is to intermittently focus the spatial branch on those regions with salient movements. In this way, even though the overall video saliency quality is heavily dependent on its spatial branch, however, the performance of the temporal branch still matter. Thus, the key factor to improve the overall video saliency is how to further boost the performance of these branches efficiently. In this paper, we propose a novel spatiotemporal network to achieve such improvement in a full interactive fashion. We integrate a lightweight temporal model into the spatial branch to coarsely locate those spatially salient regions which are correlated with trustworthy salient movements. Meanwhile, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
