Global-Local Feature Decoding with Adapter-Guided SAMv2 for Salient Object Detection
Morteza Moradi, Mohammad Moradi, Simone Palazzo, Ali Borji, Concetto Spampinato

TL;DR
This paper introduces GLASSNet, a novel framework that leverages a frozen foundation model with a dual-decoder architecture to improve salient object detection by combining global semantics and local details.
Contribution
GLASSNet employs a lightweight adapter with SAMv2 as a frozen encoder and a dual-decoder for enhanced global-local feature decoding in SOD.
Findings
GLASSNet surpasses state-of-the-art methods on standard benchmarks.
The adapter reduces learnable parameters by over 97%.
Fusion of global and local cues yields more accurate saliency maps.
Abstract
Salient Object Detection (SOD) remains an essential yet underexplored task in the era of large-scale vision models. Although foundation models like SAM exhibit strong generalization, their potential for SOD is not fully realized, and training or fully fine-tuning them is computationally expensive and prone to overfitting under limited data. To overcome these challenges, we introduce GLASSNet, a Global-Local feature decoding framework that uses SAMv2 as a frozen encoder paired with a lightweight, spatially aware convolutional adapter-reducing learnable encoder parameters by over 97%. To enhance saliency quality, GLASSNet employs a dual-decoder architecture: one decoder captures global, long-range semantics with an expanded receptive field, while the other captures fine local details such as edges and textures. Fusing these complementary cues yields saliency maps that combine global…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
