MiniALBERT: Model Distillation via Parameter-Efficient Recursive Transformers
Mohammadmahdi Nouriborji, Omid Rohanian, Samaneh Kouchaki, David A., Clifton

TL;DR
MiniALBERT introduces a novel approach combining model distillation and cross-layer parameter sharing to create compact, efficient language models that maintain high performance across various NLP tasks.
Contribution
The paper proposes MiniALBERT, a new method that integrates model distillation with recursive transformers and adapter tuning for efficient NLP models.
Findings
MiniALBERT achieves competitive performance on NLP tasks.
The approach reduces model size and computational complexity.
Code and models are publicly available for reproducibility.
Abstract
Pre-trained Language Models (LMs) have become an integral part of Natural Language Processing (NLP) in recent years, due to their superior performance in downstream applications. In spite of this resounding success, the usability of LMs is constrained by computational and time complexity, along with their increasing size; an issue that has been referred to as `overparameterisation'. Different strategies have been proposed in the literature to alleviate these problems, with the aim to create effective compact models that nearly match the performance of their bloated counterparts with negligible performance losses. One of the most popular techniques in this area of research is model distillation. Another potent but underutilised technique is cross-layer parameter sharing. In this work, we combine these two strategies and present MiniALBERT, a technique for converting the knowledge of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education
MethodsTest · Adapter
