TL;DR
TF-SSD introduces a training-free pipeline leveraging Vision Foundation Models, SAM and DINO, to improve co-salient object detection by filtering and selecting masks based on saliency and prototype similarity.
Contribution
The paper proposes a novel training-free method, TF-SSD, that synergizes SAM and DINO for effective co-salient object detection without relying on training data.
Findings
TF-SSD outperforms existing training-free methods by 13.7%.
The method effectively filters and selects masks based on intra- and inter-image saliency.
Extensive experiments validate the superior performance of TF-SSD.
Abstract
Co-salient Object Detection (CoSOD) aims to segment salient objects that consistently appear across a group of related images. Despite the notable progress achieved by recent training-based approaches, they still remain constrained by the closed-set datasets and exhibit limited generalization. However, few studies explore the potential of Vision Foundation Models (VFMs) to address CoSOD, which demonstrate a strong generalized ability and robust saliency understanding. In this paper, we investigate and leverage VFMs for CoSOD, and further propose a novel training-free method, TF-SSD, through the synergy between SAM and DINO. Specifically, we first utilize SAM to generate comprehensive raw proposals, which serve as a candidate mask pool. Then, we introduce a quality mask generator to filter out redundant masks, thereby acquiring a refined mask set. Since this generator is built upon SAM,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
