SAFE-KD: Risk-Controlled Early-Exit Distillation for Vision Backbones
Salim Khazem

TL;DR
SAFE-KD is a universal framework for vision backbones that combines hierarchical distillation and conformal risk control to enable safe early exits, reducing inference costs while maintaining accuracy and providing risk guarantees.
Contribution
It introduces SAFE-KD, a novel multi-exit distillation method with risk control for vision models, ensuring safe early exits with finite-sample guarantees.
Findings
Improves accuracy and compute efficiency across datasets.
Provides strong calibration and risk guarantees.
Maintains robustness under data corruption.
Abstract
Early-exit networks reduce inference cost by allowing ``easy'' inputs to stop early, but practical deployment hinges on knowing \emph{when} early exit is safe. We introduce SAFE-KD, a universal multi-exit wrapper for modern vision backbones that couples hierarchical distillation with \emph{conformal risk control}. SAFE-KD attaches lightweight exit heads at intermediate depths, distills a strong teacher into all exits via Decoupled Knowledge Distillation (DKD), and enforces deep-to-shallow consistency between exits. At inference, we calibrate per-exit stopping thresholds on a held-out set using conformal risk control (CRC) to guarantee a user-specified \emph{selective} misclassification risk (among the samples that exit early) under exchangeability. Across multiple datasets and architectures, SAFE-KD yields improved accuracy compute trade-offs, stronger calibration, and robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · CCD and CMOS Imaging Sensors
