Better Supervisory Signals by Observing Learning Paths

Yi Ren; Shangmin Guo; Danica J. Sutherland

arXiv:2203.02485·stat.ML·March 7, 2022·5 cites

Better Supervisory Signals by Observing Learning Paths

Yi Ren, Shangmin Guo, Danica J. Sutherland

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how observing the learning paths of models can enhance supervision signals, leading to better knowledge distillation and improved classification performance.

Contribution

It introduces a new perspective on supervision by analyzing learning trajectories and proposes Filter-KD, a novel knowledge distillation method that leverages this insight.

Findings

01

Models can refine 'bad' labels through zig-zag learning paths.

02

Learning path observation offers new insights into knowledge distillation and overfitting.

03

Filter-KD improves classification performance across various tasks.

Abstract

Better-supervised models might have better performance. In this paper, we first clarify what makes for good supervision for a classification problem, and then explain two existing label refining methods, label smoothing and knowledge distillation, in terms of our proposed criterion. To further answer why and how better supervision emerges, we observe the learning path, i.e., the trajectory of the model's predictions during training, for each training sample. We find that the model can spontaneously refine "bad" labels through a "zig-zag" learning path, which occurs on both toy and real datasets. Observing the learning path not only provides a new perspective for understanding knowledge distillation, overfitting, and learning dynamics, but also reveals that the supervisory signal of a teacher network can be very unstable near the best points in training on real tasks. Inspired by this,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joshua-ren/better_supervisory_signal
pytorchOfficial

Videos

Better Supervisory Signals by Observing Learning Paths· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Neural Networks and Applications

MethodsKnowledge Distillation · Label Smoothing