VS-Net: Multiscale Spatiotemporal Features for Lightweight Video Salient Document Detection
Hemraj Singh, Mridula Verma, Ramalingaswamy Cheruku

TL;DR
VS-Net is a lightweight deep learning model that effectively captures multiscale spatiotemporal features for video salient document detection, outperforming existing methods in accuracy and efficiency on benchmark datasets.
Contribution
The paper introduces VS-Net, a novel architecture that integrates dilated depth-wise separable convolution and Approximation Rank Pooling for improved VSDD performance.
Findings
Outperforms state-of-the-art methods on MIDV-500 dataset
Achieves higher robustness and efficiency
Effective in resource-constrained environments
Abstract
Video Salient Document Detection (VSDD) is an essential task of practical computer vision, which aims to highlight visually salient document regions in video frames. Previous techniques for VSDD focus on learning features without considering the cooperation among and across the appearance and motion cues and thus fail to perform in practical scenarios. Moreover, most of the previous techniques demand high computational resources, which limits the usage of such systems in resource-constrained settings. To handle these issues, we propose VS-Net, which captures multi-scale spatiotemporal information with the help of dilated depth-wise separable convolution and Approximation Rank Pooling. VS-Net extracts the key features locally from each frame across embedding sub-spaces and forwards the features between adjacent and parallel nodes, enhancing model performance globally. Our model generates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
Methodsfail · Convolution
