LPZero: Language Model Zero-cost Proxy Search from Zero
Peijie Dong, Lujun Li, Xiang Liu, Zhenheng Tang, Xuebo Liu, Qiang, Wang, Xiaowen Chu

TL;DR
LPZero introduces an automated framework for designing zero-cost proxies in neural architecture search, significantly improving ranking accuracy and reducing reliance on expert knowledge in NLP tasks.
Contribution
It is the first to automatically generate ZC proxies using symbolic equations and genetic programming, outperforming human-designed proxies across multiple NLP models.
Findings
LPZero achieves higher ranking consistency than existing proxies.
It demonstrates superior downstream task performance on FlexiBERT, GPT-2, and LLaMA-7B.
The framework effectively reduces trial-and-error in proxy design.
Abstract
In spite of the outstanding performance, Neural Architecture Search (NAS) is criticized for massive computation. Recently, Zero-shot NAS has emerged as a promising approach by exploiting Zero-cost (ZC) proxies, which markedly reduce computational demands. Despite this, existing ZC proxies heavily rely on expert knowledge and incur significant trial-and-error costs. Particularly in NLP tasks, most existing ZC proxies fail to surpass the performance of the naive baseline. To address these challenges, we introduce a novel framework, \textbf{LPZero}, which is the first to automatically design ZC proxies for various tasks, achieving higher ranking consistency than human-designed proxies. Specifically, we model the ZC proxy as a symbolic equation and incorporate a unified proxy search space that encompasses existing ZC proxies, which are composed of a predefined set of mathematical symbols.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Sparse Evolutionary Training · Linear Layer · Residual Connection · Weight Decay · Cosine Annealing · Dropout · Byte Pair Encoding · Softmax
