Curvature-Weighted Capacity Allocation: A Minimum Description Length Framework for Layer-Adaptive Large Language Model Optimization
Theophilus Amaefuna, Hitesh Vaidya, Anshuman Chhabra, Ankur Mali

TL;DR
This paper introduces a curvature-aware, MDL-based framework for layer-adaptive capacity allocation and pruning in large language models, providing a principled, efficient, and theoretically grounded approach.
Contribution
It develops a novel curvature-adjusted layer gain metric and formulates convex MDL programs with closed-form solutions for optimal capacity allocation and pruning.
Findings
Curvature-adjusted layer gain equals maximum second-order risk reduction.
The proposed programs have unique solutions computable via bisection.
The framework offers provable optimality and generalization guarantees.
Abstract
Layer-wise capacity in large language models is highly non-uniform: some layers contribute disproportionately to loss reduction while others are near-redundant. Existing methods for exploiting this non-uniformity, such as influence-function-based layer scoring, produce sensitivity estimates but offer no principled mechanism for translating them into allocation or pruning decisions under hardware constraints. We address this gap with a unified, curvature-aware framework grounded in the Minimum Description Length (MDL) principle. Our central quantity is the curvature-adjusted layer gain , which we show equals twice the maximal second-order reduction in empirical risk achievable by updating layer alone, and which strictly dominates gradient-norm-based scores by incorporating local curvature. Normalizing these gains into layer quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Topic Modeling · Constraint Satisfaction and Optimization
