SAFE-KD: Risk-Controlled Early-Exit Distillation for Vision Backbones

Salim Khazem

arXiv:2602.03043·cs.LG·February 4, 2026

SAFE-KD: Risk-Controlled Early-Exit Distillation for Vision Backbones

Salim Khazem

PDF

Open Access

TL;DR

SAFE-KD is a universal framework for vision backbones that combines hierarchical distillation and conformal risk control to enable safe early exits, reducing inference costs while maintaining accuracy and providing risk guarantees.

Contribution

It introduces SAFE-KD, a novel multi-exit distillation method with risk control for vision models, ensuring safe early exits with finite-sample guarantees.

Findings

01

Improves accuracy and compute efficiency across datasets.

02

Provides strong calibration and risk guarantees.

03

Maintains robustness under data corruption.

Abstract

Early-exit networks reduce inference cost by allowing ``easy'' inputs to stop early, but practical deployment hinges on knowing \emph{when} early exit is safe. We introduce SAFE-KD, a universal multi-exit wrapper for modern vision backbones that couples hierarchical distillation with \emph{conformal risk control}. SAFE-KD attaches lightweight exit heads at intermediate depths, distills a strong teacher into all exits via Decoupled Knowledge Distillation (DKD), and enforces deep-to-shallow consistency between exits. At inference, we calibrate per-exit stopping thresholds on a held-out set using conformal risk control (CRC) to guarantee a user-specified \emph{selective} misclassification risk (among the samples that exit early) under exchangeability. Across multiple datasets and architectures, SAFE-KD yields improved accuracy compute trade-offs, stronger calibration, and robust…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · CCD and CMOS Imaging Sensors