# Make me an Expert: Distilling from Generalist Black-Box Models into Specialized Models for Semantic Segmentation

**Authors:** Yasser Benigmim, Subhankar Roy, Khalid Oublal, Imad Eddine Marouf, Slim Essid, Vicky Kalogeiton, St\'ephane Lathuili\`ere

arXiv: 2509.00509 · 2025-09-03

## TL;DR

This paper introduces a novel method called ATGC for distilling specialized semantic segmentation models from generalist black-box models using only one-hot predictions, addressing the challenge of scale sensitivity and improving performance.

## Contribution

The paper proposes ATGC, a scale selection method leveraging attention maps for effective black-box model distillation in semantic segmentation, under realistic API access constraints.

## Key findings

- Significant performance improvements over baseline methods.
- Effective scale selection using attention map entropy.
- Robustness across multiple datasets.

## Abstract

The rise of Artificial Intelligence as a Service (AIaaS) democratizes access to pre-trained models via Application Programming Interfaces (APIs), but also raises a fundamental question: how can local models be effectively trained using black-box models that do not expose their weights, training data, or logits, a constraint in which current domain adaptation paradigms are impractical ? To address this challenge, we introduce the Black-Box Distillation (B2D) setting, which enables local model adaptation under realistic constraints: (1) the API model is open-vocabulary and trained on large-scale general-purpose data, and (2) access is limited to one-hot predictions only. We identify that open-vocabulary models exhibit significant sensitivity to input resolution, with different object classes being segmented optimally at different scales, a limitation termed the "curse of resolution". Our method, ATtention-Guided sCaler (ATGC), addresses this challenge by leveraging DINOv2 attention maps to dynamically select optimal scales for black-box model inference. ATGC scores the attention maps with entropy to identify informative scales for pseudo-labelling, enabling effective distillation. Experiments demonstrate substantial improvements under black-box supervision across multiple datasets while requiring only one-hot API predictions. Our code is available at https://github.com/yasserben/ATGC.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00509/full.md

## Figures

20 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00509/full.md

## References

72 references — full list in the complete paper: https://tomesphere.com/paper/2509.00509/full.md

---
Source: https://tomesphere.com/paper/2509.00509