VS-Net: Multiscale Spatiotemporal Features for Lightweight Video Salient   Document Detection

Hemraj Singh; Mridula Verma; Ramalingaswamy Cheruku

arXiv:2301.04447·cs.CV·January 12, 2023

VS-Net: Multiscale Spatiotemporal Features for Lightweight Video Salient Document Detection

Hemraj Singh, Mridula Verma, Ramalingaswamy Cheruku

PDF

Open Access

TL;DR

VS-Net is a lightweight deep learning model that effectively captures multiscale spatiotemporal features for video salient document detection, outperforming existing methods in accuracy and efficiency on benchmark datasets.

Contribution

The paper introduces VS-Net, a novel architecture that integrates dilated depth-wise separable convolution and Approximation Rank Pooling for improved VSDD performance.

Findings

01

Outperforms state-of-the-art methods on MIDV-500 dataset

02

Achieves higher robustness and efficiency

03

Effective in resource-constrained environments

Abstract

Video Salient Document Detection (VSDD) is an essential task of practical computer vision, which aims to highlight visually salient document regions in video frames. Previous techniques for VSDD focus on learning features without considering the cooperation among and across the appearance and motion cues and thus fail to perform in practical scenarios. Moreover, most of the previous techniques demand high computational resources, which limits the usage of such systems in resource-constrained settings. To handle these issues, we propose VS-Net, which captures multi-scale spatiotemporal information with the help of dilated depth-wise separable convolution and Approximation Rank Pooling. VS-Net extracts the key features locally from each frame across embedding sub-spaces and forwards the features between adjacent and parallel nodes, enhancing model performance globally. Our model generates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications

Methodsfail · Convolution