Elastic Architecture Search for Efficient Language Models
Shang Wang

TL;DR
This paper presents Elastic Language Model (ELM), a neural architecture search method that creates efficient, flexible transformer-based language models, reducing computational costs while maintaining high performance on language tasks.
Contribution
The paper introduces ELM, a NAS approach with a flexible search space and novel distillation losses, improving the efficiency and effectiveness of designing compact language models.
Findings
Models discovered by ELM outperform existing methods on language modeling tasks.
ELM reduces computational and memory requirements for large language models.
The approach enhances the exploration of model architectures through dynamic modules.
Abstract
As large pre-trained language models become increasingly critical to natural language understanding (NLU) tasks, their substantial computational and memory requirements have raised significant economic and environmental concerns. Addressing these challenges, this paper introduces the Elastic Language Model (ELM), a novel neural architecture search (NAS) method optimized for compact language models. ELM extends existing NAS approaches by introducing a flexible search space with efficient transformer blocks and dynamic modules for dimension and head number adjustment. These innovations enhance the efficiency and flexibility of the search process, which facilitates more thorough and effective exploration of model architectures. We also introduce novel knowledge distillation losses that preserve the unique characteristics of each block, in order to improve the discrimination between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
