Optimizing Language Models for Grammatical Acceptability: A Comparative   Study of Fine-Tuning Techniques

Shobhit Ratan; Farley Knight; Ghada Jerfel; Sze Chung Ho

arXiv:2501.07853·cs.CL·January 15, 2025

Optimizing Language Models for Grammatical Acceptability: A Comparative Study of Fine-Tuning Techniques

Shobhit Ratan, Farley Knight, Ghada Jerfel, Sze Chung Ho

PDF

Open Access

TL;DR

This paper compares various fine-tuning methods for language models on grammatical acceptability, highlighting efficiency gains and accuracy trade-offs, with a focus on democratizing access to large models.

Contribution

It introduces a comparative analysis of fine-tuning techniques, emphasizing parameter-efficient methods like LoRA for improved efficiency without sacrificing accuracy.

Findings

01

LoRA reduces memory and training time by over 50%.

02

VFT achieves the highest accuracy at 81.2%.

03

Context Distillation underperforms with 31% accuracy.

Abstract

This study explores the fine-tuning (FT) of the Open Pre-trained Transformer (OPT-125M) for grammatical acceptability tasks using the CoLA dataset. By comparing Vanilla-Fine-Tuning (VFT), Pattern-Based-Fine-Tuning (PBFT), and Parameter-Efficient Fine-Tuning techniques (PEFT) like Low-Rank Adaptation (LoRA), we demonstrate significant improvements in computational efficiency while maintaining high accuracy. Our experiments reveal that while VFT achieves the highest accuracy (81.2%), LoRA enhancing FT by reducing memory usage and iteration time by more than 50%, and increases accuracy in PBFT case. Context Distillation (CD), though computationally efficient, underperformed with accuracy around 31%. Our findings contribute to democratizing access to large language models (LLM) by reducing computational barriers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Speech and dialogue systems

MethodsAttention Is All You Need · Absolute Position Encodings · Adam · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Linear Layer · COLA · Multi-Head Attention