TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination

Omar Naim; Krish Sharma; Niyar R Barman; Nicholas Asher

arXiv:2510.22767·cs.LG·May 12, 2026

TELL-TALE: Task Efficient LLMs with Task Aware Layer Elimination

Omar Naim, Krish Sharma, Niyar R Barman, Nicholas Asher

PDF

TL;DR

TALE is a method that enhances large language model inference by selectively removing irrelevant layers for specific tasks, improving performance and reducing computational costs without retraining.

Contribution

Introduces TALE, a practical inference-time layer elimination technique that optimizes task-specific LLM architectures across multiple tasks and model families.

Findings

01

TALE matches or surpasses baseline performance on 9 tasks.

02

TALE reduces computational costs during inference.

03

TALE synergizes with fine-tuning for further gains.

Abstract

Large Language Models (LLMs) typically come with a fixed architecture, despite growing evidence that not all layers contribute equally to every downstream task. We introduce TALE (Task-Aware Layer Elimination), an inference-time method that improves task performance by selectively removing layers that are irrelevant or detrimental for a given task. TALE optimizes task-specific performance, yielding a task-optimized architecture without retraining. Across 9 tasks and 5 model families, under both zero-shot and few-shot settings, TALE consistently matches or surpasses baseline performance while simultaneously reducing computational costs. TALE also synergizes with fine-tuning, leading to further performance improvements. Computing TALE for a new task requires modest resources, making it a practical and deployable solution for task-specialized LLM inference.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.