LoRA Meets Dropout under a Unified Framework
Sheng Wang, Liheng Chen, Jiyue Jiang, Boyang Xue, Lingpeng Kong, Chuan, Wu

TL;DR
This paper investigates the effectiveness of dropout methods in parameter-efficient finetuning of large language models, introduces a unified framework for their comparison, and proposes a new method called HiddenKey that outperforms existing techniques.
Contribution
It establishes the equivalence of transformer-specific dropout methods, develops a unified framework for their analysis, and introduces HiddenKey, a novel dropout method that improves finetuning performance.
Findings
HiddenKey outperforms existing dropout methods across multiple models and tasks.
Parameter-efficient LoRA is also prone to overfitting, similar to full finetuning.
The unified framework reveals new preferences and insights for dropout methods in limited parameter scenarios.
Abstract
With the remarkable capabilities, large language models (LLMs) have emerged as essential elements in numerous NLP applications, while parameter-efficient finetuning, especially LoRA, has gained popularity as a lightweight approach for model customization. Meanwhile, various dropout methods, initially designed for full finetuning with all the parameters updated, alleviates overfitting associated with excessive parameter redundancy. Hence, a possible contradiction arises from negligible trainable parameters of LoRA and the effectiveness of previous dropout methods, which has been largely overlooked. To fill this gap, we first confirm that parameter-efficient LoRA is also overfitting-prone. We then revisit transformer-specific dropout methods, and establish their equivalence and distinctions mathematically and empirically. Building upon this comparative analysis, we introduce a unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsRobotics and Automated Systems · Distributed and Parallel Computing Systems · Context-Aware Activity Recognition Systems
MethodsDropout
