Rethinking Lightweight Salient Object Detection via Network Depth-Width   Tradeoff

Jia Li; Shengye Qiao; Zhirui Zhao; Chenxi Xie; Xiaowu Chen and; Changqun Xia

arXiv:2301.06679·cs.CV·January 18, 2023

Rethinking Lightweight Salient Object Detection via Network Depth-Width Tradeoff

Jia Li, Shengye Qiao, Zhirui Zhao, Chenxi Xie, Xiaowu Chen and, Changqun Xia

PDF

Open Access

TL;DR

This paper introduces a lightweight salient object detection framework that balances efficiency and accuracy by decoupling the U-shape structure into three branches and optimizing network depth and width for different application needs.

Contribution

The authors propose a novel trilateral decoder framework and a scale-adaptive pooling module, enabling effective lightweight models without additional parameters, and explore the depth-width tradeoff for SOD.

Findings

01

Achieves high FPS on resource-constrained devices

02

Outperforms existing methods on five benchmarks

03

Offers multiple model variants for different application scenarios

Abstract

Existing salient object detection methods often adopt deeper and wider networks for better performance, resulting in heavy computational burden and slow inference speed. This inspires us to rethink saliency detection to achieve a favorable balance between efficiency and accuracy. To this end, we design a lightweight framework while maintaining satisfying competitive accuracy. Specifically, we propose a novel trilateral decoder framework by decoupling the U-shape structure into three complementary branches, which are devised to confront the dilution of semantic context, loss of spatial structure and absence of boundary detail, respectively. Along with the fusion of three branches, the coarse segmentation results are gradually refined in structure details and boundary quality. Without adding additional learnable parameters, we further propose Scale-Adaptive Pooling Module to obtain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings