Bootstrap Segmentation Foundation Model under Distribution Shift via   Object-Centric Learning

Luyao Tang; Yuxuan Yuan; Chaoqi Chen; Kunze Huang; Xinghao Ding; Yue; Huang

arXiv:2408.16310·cs.CV·August 30, 2024

Bootstrap Segmentation Foundation Model under Distribution Shift via Object-Centric Learning

Luyao Tang, Yuxuan Yuan, Chaoqi Chen, Kunze Huang, Xinghao Ding, Yue, Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces SlotSAM, a self-supervised, object-centric learning method that enhances foundation models' ability to generalize across out-of-distribution data, especially in challenging environments like medical and camouflaged images.

Contribution

SlotSAM reconstructs encoder features into object-centric representations, improving foundation models' robustness and generalization with minimal fine-tuning and a simple, adaptable approach.

Findings

01

Significantly improves out-of-distribution generalization

02

Enhances object-level perceptual capabilities of foundation models

03

Requires limited parameter fine-tuning

Abstract

Foundation models have made incredible strides in achieving zero-shot or few-shot generalization, leveraging prompt engineering to mimic the problem-solving approach of human intelligence. However, when it comes to some foundation models like Segment Anything, there is still a challenge in performing well on out-of-distribution data, including camouflaged and medical images. Inconsistent prompting strategies during fine-tuning and testing further compound the issue, leading to decreased performance. Drawing inspiration from how human cognition processes new environments, we introduce SlotSAM, a method that reconstructs features from the encoder in a self-supervised manner to create object-centric representations. These representations are then integrated into the foundation model, bolstering its object-level perceptual capabilities while reducing the impact of distribution-related…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lytang63/slotsam
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Image Processing and 3D Reconstruction · Machine Learning and Data Classification