On The Distribution of Penultimate Activations of Classification Networks
Minkyo Seo, Yoonho Lee, Suha Kwak

TL;DR
This paper reveals that the penultimate activations of classification networks trained with cross-entropy form a generative-discriminative pair, enabling stable knowledge distillation and transfer to generative models for image synthesis.
Contribution
It introduces a novel perspective on penultimate activations as a generative model parameterized by the final layer weights, facilitating new transfer learning methods.
Findings
The distribution of penultimate activations can be modeled as a generative process.
This generative model improves knowledge distillation under domain shift.
It enables transfer of learned representations to generative models for image generation.
Abstract
This paper studies probability distributions of penultimate activations of classification networks. We show that, when a classification network is trained with the cross-entropy loss, its final classification layer forms a Generative-Discriminative pair with a generative classifier based on a specific distribution of penultimate activations. More importantly, the distribution is parameterized by the weights of the final fully-connected layer, and can be considered as a generative model that synthesizes the penultimate activations without feeding input data. We empirically demonstrate that this generative model enables stable knowledge distillation in the presence of domain shift, and can transfer knowledge from a classifier to variational autoencoders and generative adversarial networks for class-conditional image generation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Neural Networks and Applications · Bayesian Modeling and Causal Inference
MethodsKnowledge Distillation
