VISTA: Validation-Informed Trajectory Adaptation via Self-Distillation
Eli Corn, Daphna Weinshall

TL;DR
VISTA is a self-distillation framework that improves deep learning model robustness and generalization by leveraging validation-informed trajectory regularization and ensemble of expert anchors.
Contribution
It introduces a novel online self-distillation method that uses validation signals to identify and ensemble early model states, enhancing training stability and performance.
Findings
VISTA outperforms standard training and prior self-distillation methods on multiple benchmarks.
A lightweight implementation reduces storage overhead by 90% without performance loss.
VISTA enhances model robustness and generalization across various tasks.
Abstract
Deep learning models may converge to suboptimal solutions despite strong validation accuracy, masking an optimization failure we term Trajectory Deviation. This is because as training proceeds, models can abandon high generalization states for specific data sub-populations, thus discarding previously learned latent features without triggering classical overfitting signals. To address this problem we introduce VISTA, an online self-distillation framework that enforces consistency along the optimization trajectory. Using a validation-informed Marginal Coverage score, VISTA identifies expert anchors, which are earlier model states that retain specialized competence over distinct data regions. A coverage-weighted ensemble of these anchors is integrated online during training, regularizing the loss landscape and preserving mastered knowledge. When evaluated across multiple benchmarks, VISTA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
