Knowledge Transfer with Jacobian Matching

Suraj Srinivas; Francois Fleuret

arXiv:1803.00443·cs.LG·March 2, 2018·58 cites

Knowledge Transfer with Jacobian Matching

Suraj Srinivas, Francois Fleuret

PDF

Open Access

TL;DR

This paper introduces a principled approach to Jacobian matching for neural network distillation, linking it to input noise methods, and demonstrates its benefits for transfer learning and robustness.

Contribution

It establishes an equivalence between Jacobian matching and input noise-based distillation, deriving suitable loss functions and applying this to improve transfer learning.

Findings

01

Jacobian matching improves distillation performance

02

Enhances robustness to noisy inputs

03

Benefits transfer learning tasks

Abstract

Classical distillation methods transfer representations from a "teacher" neural network to a "student" network by matching their output activations. Recent methods also match the Jacobians, or the gradient of output activations with the input. However, this involves making some ad hoc decisions, in particular, the choice of the loss function. In this paper, we first establish an equivalence between Jacobian matching and distillation with input noise, from which we derive appropriate loss functions for Jacobian matching. We then rely on this analysis to apply Jacobian matching to transfer learning by establishing equivalence of a recent transfer learning procedure to distillation. We then show experimentally on standard image datasets that Jacobian-based penalties improve distillation, robustness to noisy inputs, and transfer learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Optimization and Search Problems