How to Train Your Multi-Exit Model? Analyzing the Impact of Training Strategies
Piotr Kubaty, Bartosz W\'ojcik, Bart{\l}omiej Krzepkowski, Monika Michaluk, Tomasz Trzci\'nski, Jary Pomponi, Kamil Adamczewski

TL;DR
This paper analyzes how different training strategies affect multi-exit neural networks, introduces metrics for analysis, and proposes a mixed training approach that improves performance and efficiency.
Contribution
It introduces a set of metrics to analyze training dynamics and proposes a novel mixed training strategy for multi-exit models, demonstrating its advantages.
Findings
Conventional joint and disjoint training strategies are suboptimal.
The proposed mixed training strategy improves performance and efficiency.
Comprehensive evaluations validate the effectiveness of the new approach.
Abstract
Early exits enable the network's forward pass to terminate early by attaching trainable internal classifiers to the backbone network. Existing early-exit methods typically adopt either a joint training approach, where the backbone and exit heads are trained simultaneously, or a disjoint approach, where the heads are trained separately. However, the implications of this choice are often overlooked, with studies typically adopting one approach without adequate justification. This choice influences training dynamics and its impact remains largely unexplored. In this paper, we introduce a set of metrics to analyze early-exit training dynamics and guide the choice of training strategy. We demonstrate that conventionally used joint and disjoint regimes yield suboptimal performance. To address these limitations, we propose a mixed training strategy: the backbone is trained first, followed by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsEconomic Policies and Impacts
