Quickly Tuning Foundation Models for Image Segmentation
Breenda Das, Lennart Purucker, Timur Carstensen, Frank Hutter

TL;DR
This paper introduces QTT-SEG, a meta-learning approach that automates and accelerates the fine-tuning of foundation models like SAM for image segmentation, achieving rapid and improved performance on various datasets.
Contribution
QTT-SEG leverages meta-learning to efficiently predict optimal hyperparameters, significantly reducing manual effort and time in fine-tuning foundation models for domain-specific image segmentation tasks.
Findings
QTT-SEG outperforms SAM's zero-shot performance on multiple datasets.
QTT-SEG surpasses AutoGluon Multimodal within three minutes on most binary tasks.
QTT-SEG provides consistent improvements on multiclass segmentation datasets.
Abstract
Foundation models like SAM (Segment Anything Model) exhibit strong zero-shot image segmentation performance, but often fall short on domain-specific tasks. Fine-tuning these models typically requires significant manual effort and domain expertise. In this work, we introduce QTT-SEG, a meta-learning-driven approach for automating and accelerating the fine-tuning of SAM for image segmentation. Built on the Quick-Tune hyperparameter optimization framework, QTT-SEG predicts high-performing configurations using meta-learned cost and performance models, efficiently navigating a search space of over 200 million possibilities. We evaluate QTT-SEG on eight binary and five multiclass segmentation datasets under tight time constraints. Our results show that QTT-SEG consistently improves upon SAM's zero-shot performance and surpasses AutoGluon Multimodal, a strong AutoML baseline, on most binary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
