Contrastive Distillation Is a Sample-Efficient Self-Supervised Loss   Policy for Transfer Learning

Chris Lengerich; Gabriel Synnaeve; Amy Zhang; Hugh Leather; Kurt; Shuster; Fran\c{c}ois Charton; Charysse Redwood

arXiv:2212.11353·cs.CL·December 23, 2022

Contrastive Distillation Is a Sample-Efficient Self-Supervised Loss Policy for Transfer Learning

Chris Lengerich, Gabriel Synnaeve, Amy Zhang, Hugh Leather, Kurt, Shuster, Fran\c{c}ois Charton, Charysse Redwood

PDF

Open Access

TL;DR

This paper introduces contrastive distillation, a self-supervised loss policy that enhances transfer learning efficiency by leveraging high mutual information between source and target tasks, outperforming traditional methods.

Contribution

The paper proposes a novel contrastive distillation method that improves transfer learning by efficiently sampling negative examples and capturing high mutual information.

Findings

01

Outperforms common transfer learning methods

02

Enables more efficient sampling of negative examples

03

Facilitates rapid adaptation in diverse subspaces

Abstract

Traditional approaches to RL have focused on learning decision policies directly from episodic decisions, while slowly and implicitly learning the semantics of compositional representations needed for generalization. While some approaches have been adopted to refine representations via auxiliary self-supervised losses while simultaneously learning decision policies, learning compositional representations from hand-designed and context-independent self-supervised losses (multi-view) still adapts relatively slowly to the real world, which contains many non-IID subspaces requiring rapid distribution shift in both time and spatial attention patterns at varying levels of abstraction. In contrast, supervised language model cascades have shown the flexibility to adapt to many diverse manifolds, and hints of self-learning needed for autonomous task transfer. However, to date, transfer methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Speech and dialogue systems

MethodsSelf-Learning