Learning the Wrong Lessons: Inserting Trojans During Knowledge   Distillation

Leonard Tang; Tom Shlomi; Alexander Cai

arXiv:2303.05593·cs.LG·March 13, 2023·1 cites

Learning the Wrong Lessons: Inserting Trojans During Knowledge Distillation

Leonard Tang, Tom Shlomi, Alexander Cai

PDF

Open Access

TL;DR

This paper demonstrates how Trojan attacks can be embedded during knowledge distillation, creating malicious models that degrade performance without affecting the teacher, highlighting a new security vulnerability.

Contribution

It introduces a novel Trojan attack method during knowledge distillation that reduces student accuracy without impacting the teacher, revealing a new threat in model training.

Findings

01

Trojan attacks can be embedded during knowledge distillation.

02

The attack reduces student model accuracy.

03

The attack does not alter teacher performance.

Abstract

In recent years, knowledge distillation has become a cornerstone of efficiently deployed machine learning, with labs and industries using knowledge distillation to train models that are inexpensive and resource-optimized. Trojan attacks have contemporaneously gained significant prominence, revealing fundamental vulnerabilities in deep learning models. Given the widespread use of knowledge distillation, in this work we seek to exploit the unlabelled data knowledge distillation process to embed Trojans in a student model without introducing conspicuous behavior in the teacher. We ultimately devise a Trojan attack that effectively reduces student accuracy, does not alter teacher performance, and is efficiently constructible in practice.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsKnowledge Distillation