Benchmarking Distilled Language Models: Performance and Efficiency in Resource-Constrained Settings

Sachin Gopal Wani; Eric Page; Ajay Dholakia; and David Ellison

arXiv:2602.20164·cs.CL·February 25, 2026

Benchmarking Distilled Language Models: Performance and Efficiency in Resource-Constrained Settings

Sachin Gopal Wani, Eric Page, Ajay Dholakia, and David Ellison

PDF

Open Access

TL;DR

This paper benchmarks distilled language models, showing they are significantly more efficient and can match or surpass larger models in performance, making AI more accessible in resource-limited settings.

Contribution

It provides a comprehensive performance and efficiency comparison of distilled models versus vanilla and proprietary models, highlighting distillation's effectiveness.

Findings

01

Distilled 8B models are over 2,000 times more compute-efficient than vanilla models.

02

Distilled models achieve reasoning capabilities comparable to or better than much larger models.

03

Distillation significantly improves the performance-to-compute ratio of language models.

Abstract

Knowledge distillation offers a transformative pathway to developing powerful, yet efficient, small language models (SLMs) suitable for resource-constrained environments. In this paper, we benchmark the performance and computational cost of distilled models against their vanilla and proprietary counterparts, providing a quantitative analysis of their efficiency. Our results demonstrate that distillation creates a superior performance-tocompute curve. We find that creating a distilled 8B model is over 2,000 times more compute-efficient than training its vanilla counterpart, while achieving reasoning capabilities on par with, or even exceeding, standard models ten times its size. These findings validate distillation not just as a compression technique, but as a primary strategy for building state-of-the-art, accessible AI

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications