TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability

Krish Sharma; Omar Naim; Soumadeep Saha; Vinija Jain; Aman Chadha; Nicholas Asher

arXiv:2605.14738·cs.LG·May 22, 2026

TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability

Krish Sharma, Omar Naim, Soumadeep Saha, Vinija Jain, Aman Chadha, Nicholas Asher

PDF

TL;DR

This paper investigates task-aware layer pruning, showing it does not improve in-distribution performance but enhances out-of-distribution accuracy by realigning model geometry with task-adapted representations.

Contribution

It provides a geometric explanation for task-aware pruning's effectiveness on OOD data and demonstrates consistent improvements across models and controlled experiments.

Findings

01

Pruning improves OOD accuracy but not ID performance.

02

OOD inputs distort representation geometry compared to ID inputs.

03

Removing layers that cause distortion realigns OOD data with task geometry.

Abstract

Recent work has promoted task-aware layer pruning as a way to improve model performance on particular tasks, as shown by TALE. In this paper, we investigate when such improvements occur and why. We show first that, across controlled polynomial regression tasks and large language models, such pruning yields no benefit on in-distribution (ID) data but consistently improves out-of-distribution (OOD) accuracy. We further show empirically that OOD inputs induce layerwise norm and pairwise-distance profiles that deviate from the corresponding ID profiles. This leads to a geometric explanation of task-aware pruning: each task induces a task-adapted geometry, characterized empirically by the representation profiles observed on ID inputs. OOD inputs can introduce a distorted version of the task-adapted geometry. Task-aware pruning identifies layers that create or amplify this distortion; by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.