VISTA: Validation-Informed Trajectory Adaptation via Self-Distillation

Eli Corn; Daphna Weinshall

arXiv:2604.12044·cs.LG·April 15, 2026

VISTA: Validation-Informed Trajectory Adaptation via Self-Distillation

Eli Corn, Daphna Weinshall

PDF

TL;DR

VISTA is a self-distillation framework that improves deep learning model robustness and generalization by leveraging validation-informed trajectory regularization and ensemble of expert anchors.

Contribution

It introduces a novel online self-distillation method that uses validation signals to identify and ensemble early model states, enhancing training stability and performance.

Findings

01

VISTA outperforms standard training and prior self-distillation methods on multiple benchmarks.

02

A lightweight implementation reduces storage overhead by 90% without performance loss.

03

VISTA enhances model robustness and generalization across various tasks.

Abstract

Deep learning models may converge to suboptimal solutions despite strong validation accuracy, masking an optimization failure we term Trajectory Deviation. This is because as training proceeds, models can abandon high generalization states for specific data sub-populations, thus discarding previously learned latent features without triggering classical overfitting signals. To address this problem we introduce VISTA, an online self-distillation framework that enforces consistency along the optimization trajectory. Using a validation-informed Marginal Coverage score, VISTA identifies expert anchors, which are earlier model states that retain specialized competence over distinct data regions. A coverage-weighted ensemble of these anchors is integrated online during training, regularizing the loss landscape and preserving mastered knowledge. When evaluated across multiple benchmarks, VISTA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.