Tailored-LLaMA: Optimizing Few-Shot Learning in Pruned LLaMA Models with Task-Specific Prompts
Danyal Aftab, Steven Davy

TL;DR
This paper introduces Tailored LLaMA, a method for efficient few-shot learning on pruned LLaMA models using task-specific prompts and LoRA, achieving high accuracy with significantly reduced model sizes.
Contribution
The paper presents a novel approach combining structural pruning, task-specific prompts, and LoRA for effective fine-tuning of pruned LLaMA models in few-shot learning scenarios.
Findings
Fine-tuning pruned models restores high accuracy in classification tasks.
Models retain over 65 ext{%} of baseline accuracy after 50 ext{%} pruning.
Fine-tuning less than one hour achieves near-baseline performance.
Abstract
Large language models demonstrate impressive proficiency in language understanding and generation. Nonetheless, training these models from scratch, even the least complex billion-parameter variant demands significant computational resources rendering it economically impractical for many organizations. With large language models functioning as general-purpose task solvers, this paper investigates their task-specific fine-tuning. We employ task-specific datasets and prompts to fine-tune two pruned LLaMA models having 5 billion and 4 billion parameters. This process utilizes the pre-trained weights and focuses on a subset of weights using the LoRA method. One challenge in fine-tuning the LLaMA model is crafting a precise prompt tailored to the specific task. To address this, we propose a novel approach to fine-tune the LLaMA model under two primary constraints: task specificity and prompt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning · LLaMA
