Knowledge Distillation for Multi-task Learning
Wei-Hong Li, Hakan Bilen

TL;DR
This paper introduces a knowledge distillation approach for multi-task learning that balances task-specific features and improves overall performance by aligning shared and task-specific representations.
Contribution
It proposes a novel method using task-specific models and adaptors to address imbalance in multi-task learning, enhancing shared parameter efficiency.
Findings
Improved multi-task model performance across various tasks.
Effective balancing of task-specific and shared features.
Demonstrated superiority over existing methods in experiments.
Abstract
Multi-task learning (MTL) is to learn one single model that performs multiple tasks for achieving good performance on all tasks and lower cost on computation. Learning such a model requires to jointly optimize losses of a set of tasks with different difficulty levels, magnitudes, and characteristics (e.g. cross-entropy, Euclidean loss), leading to the imbalance problem in multi-task learning. To address the imbalance problem, we propose a knowledge distillation based method in this work. We first learn a task-specific model for each task. We then learn the multi-task model for minimizing task-specific loss and for producing the same feature with task-specific models. As the task-specific network encodes different features, we introduce small task-specific adaptors to project multi-task features to the task-specific features. In this way, the adaptors align the task-specific feature and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsKnowledge Distillation
