Quickly Tuning Foundation Models for Image Segmentation

Breenda Das; Lennart Purucker; Timur Carstensen; Frank Hutter

arXiv:2508.17283·cs.CV·August 26, 2025

Quickly Tuning Foundation Models for Image Segmentation

Breenda Das, Lennart Purucker, Timur Carstensen, Frank Hutter

PDF

TL;DR

This paper introduces QTT-SEG, a meta-learning approach that automates and accelerates the fine-tuning of foundation models like SAM for image segmentation, achieving rapid and improved performance on various datasets.

Contribution

QTT-SEG leverages meta-learning to efficiently predict optimal hyperparameters, significantly reducing manual effort and time in fine-tuning foundation models for domain-specific image segmentation tasks.

Findings

01

QTT-SEG outperforms SAM's zero-shot performance on multiple datasets.

02

QTT-SEG surpasses AutoGluon Multimodal within three minutes on most binary tasks.

03

QTT-SEG provides consistent improvements on multiclass segmentation datasets.

Abstract

Foundation models like SAM (Segment Anything Model) exhibit strong zero-shot image segmentation performance, but often fall short on domain-specific tasks. Fine-tuning these models typically requires significant manual effort and domain expertise. In this work, we introduce QTT-SEG, a meta-learning-driven approach for automating and accelerating the fine-tuning of SAM for image segmentation. Built on the Quick-Tune hyperparameter optimization framework, QTT-SEG predicts high-performing configurations using meta-learned cost and performance models, efficiently navigating a search space of over 200 million possibilities. We evaluate QTT-SEG on eight binary and five multiclass segmentation datasets under tight time constraints. Our results show that QTT-SEG consistently improves upon SAM's zero-shot performance and surpasses AutoGluon Multimodal, a strong AutoML baseline, on most binary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.