TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination
Omar Naim, Krish Sharma, Niyar R Barman, Nicholas Asher

TL;DR
TALE is a method that enhances large language model inference by selectively removing irrelevant layers for specific tasks, improving performance and reducing computational costs without retraining.
Contribution
Introduces TALE, a practical inference-time layer elimination technique that optimizes task-specific LLM architectures across multiple tasks and model families.
Findings
TALE matches or surpasses baseline performance on 9 tasks.
TALE reduces computational costs during inference.
TALE synergizes with fine-tuning for further gains.
Abstract
Large Language Models (LLMs) typically come with a fixed architecture, despite growing evidence that not all layers contribute equally to every downstream task. We introduce TALE (Task-Aware Layer Elimination), an inference-time method that improves task performance by selectively removing layers that are irrelevant or detrimental for a given task. TALE optimizes task-specific performance, yielding a task-optimized architecture without retraining. Across 9 tasks and 5 model families, under both zero-shot and few-shot settings, TALE consistently matches or surpasses baseline performance while simultaneously reducing computational costs. TALE also synergizes with fine-tuning, leading to further performance improvements. Computing TALE for a new task requires modest resources, making it a practical and deployable solution for task-specialized LLM inference.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
