Curvature-Weighted Capacity Allocation: A Minimum Description Length Framework for Layer-Adaptive Large Language Model Optimization

Theophilus Amaefuna; Hitesh Vaidya; Anshuman Chhabra; Ankur Mali

arXiv:2603.00910·cs.IT·March 3, 2026

Curvature-Weighted Capacity Allocation: A Minimum Description Length Framework for Layer-Adaptive Large Language Model Optimization

Theophilus Amaefuna, Hitesh Vaidya, Anshuman Chhabra, Ankur Mali

PDF

Open Access

TL;DR

This paper introduces a curvature-aware, MDL-based framework for layer-adaptive capacity allocation and pruning in large language models, providing a principled, efficient, and theoretically grounded approach.

Contribution

It develops a novel curvature-adjusted layer gain metric and formulates convex MDL programs with closed-form solutions for optimal capacity allocation and pruning.

Findings

01

Curvature-adjusted layer gain equals maximum second-order risk reduction.

02

The proposed programs have unique solutions computable via bisection.

03

The framework offers provable optimality and generalization guarantees.

Abstract

Layer-wise capacity in large language models is highly non-uniform: some layers contribute disproportionately to loss reduction while others are near-redundant. Existing methods for exploiting this non-uniformity, such as influence-function-based layer scoring, produce sensitivity estimates but offer no principled mechanism for translating them into allocation or pruning decisions under hardware constraints. We address this gap with a unified, curvature-aware framework grounded in the Minimum Description Length (MDL) principle. Our central quantity is the curvature-adjusted layer gain $ζ_{k}^{2} = g_{k}^{⊤} H_{k k}^{- 1} g_{k}$ , which we show equals twice the maximal second-order reduction in empirical risk achievable by updating layer $k$ alone, and which strictly dominates gradient-norm-based scores by incorporating local curvature. Normalizing these gains into layer quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Topic Modeling · Constraint Satisfaction and Optimization