Optimizing Language Models for Grammatical Acceptability: A Comparative Study of Fine-Tuning Techniques
Shobhit Ratan, Farley Knight, Ghada Jerfel, Sze Chung Ho

TL;DR
This paper compares various fine-tuning methods for language models on grammatical acceptability, highlighting efficiency gains and accuracy trade-offs, with a focus on democratizing access to large models.
Contribution
It introduces a comparative analysis of fine-tuning techniques, emphasizing parameter-efficient methods like LoRA for improved efficiency without sacrificing accuracy.
Findings
LoRA reduces memory and training time by over 50%.
VFT achieves the highest accuracy at 81.2%.
Context Distillation underperforms with 31% accuracy.
Abstract
This study explores the fine-tuning (FT) of the Open Pre-trained Transformer (OPT-125M) for grammatical acceptability tasks using the CoLA dataset. By comparing Vanilla-Fine-Tuning (VFT), Pattern-Based-Fine-Tuning (PBFT), and Parameter-Efficient Fine-Tuning techniques (PEFT) like Low-Rank Adaptation (LoRA), we demonstrate significant improvements in computational efficiency while maintaining high accuracy. Our experiments reveal that while VFT achieves the highest accuracy (81.2%), LoRA enhancing FT by reducing memory usage and iteration time by more than 50%, and increases accuracy in PBFT case. Context Distillation (CD), though computationally efficient, underperformed with accuracy around 31%. Our findings contribute to democratizing access to large language models (LLM) by reducing computational barriers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Speech and dialogue systems
MethodsAttention Is All You Need · Absolute Position Encodings · Adam · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Linear Layer · COLA · Multi-Head Attention
