Meta-Ensemble Parameter Learning
Zhengcong Fei, Shuman Tian, Junshi Huang, Xiaoming Wei, Xiaolin Wei

TL;DR
This paper introduces WeightFormer, a Transformer-based model that predicts neural network weights directly from teacher models, achieving ensemble-like performance with improved efficiency and scalability.
Contribution
We propose WeightFormer, a novel meta-learning approach that predicts network parameters layer-by-layer, enabling efficient ensemble approximation without re-training.
Findings
WeightFormer achieves ensemble-level accuracy on CIFAR-10, CIFAR-100, and ImageNet.
Outperforms standard knowledge distillation and single networks.
Can further improve performance with minor fine-tuning.
Abstract
Ensemble of machine learning models yields improved performance as well as robustness. However, their memory requirements and inference costs can be prohibitively high. Knowledge distillation is an approach that allows a single model to efficiently capture the approximate performance of an ensemble while showing poor scalability as demand for re-training when introducing new teacher models. In this paper, we study if we can utilize the meta-learning strategy to directly predict the parameters of a single model with comparable performance of an ensemble. Hereto, we introduce WeightFormer, a Transformer-based model that can predict student network weights layer by layer in a forward pass, according to the teacher model parameters. The proprieties of WeightFormer are investigated on the CIFAR-10, CIFAR-100, and ImageNet datasets for model structures of VGGNet-11, ResNet-50, and ViT-B/32,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)
MethodsKnowledge Distillation
