Rethinking Lightweight Salient Object Detection via Network Depth-Width Tradeoff
Jia Li, Shengye Qiao, Zhirui Zhao, Chenxi Xie, Xiaowu Chen and, Changqun Xia

TL;DR
This paper introduces a lightweight salient object detection framework that balances efficiency and accuracy by decoupling the U-shape structure into three branches and optimizing network depth and width for different application needs.
Contribution
The authors propose a novel trilateral decoder framework and a scale-adaptive pooling module, enabling effective lightweight models without additional parameters, and explore the depth-width tradeoff for SOD.
Findings
Achieves high FPS on resource-constrained devices
Outperforms existing methods on five benchmarks
Offers multiple model variants for different application scenarios
Abstract
Existing salient object detection methods often adopt deeper and wider networks for better performance, resulting in heavy computational burden and slow inference speed. This inspires us to rethink saliency detection to achieve a favorable balance between efficiency and accuracy. To this end, we design a lightweight framework while maintaining satisfying competitive accuracy. Specifically, we propose a novel trilateral decoder framework by decoupling the U-shape structure into three complementary branches, which are devised to confront the dilution of semantic context, loss of spatial structure and absence of boundary detail, respectively. Along with the fusion of three branches, the coarse segmentation results are gradually refined in structure details and boundary quality. Without adding additional learnable parameters, we further propose Scale-Adaptive Pooling Module to obtain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
