Assessing Small Language Models for Code Generation: An Empirical Study with Benchmarks

Md Mahade Hasan; Muhammad Waseem; Kai-Kristian Kemell; Jussi Rasku; Juha Ala-Rantala; Pekka Abrahamsson

arXiv:2507.03160·cs.SE·January 21, 2026

Assessing Small Language Models for Code Generation: An Empirical Study with Benchmarks

Md Mahade Hasan, Muhammad Waseem, Kai-Kristian Kemell, Jussi Rasku, Juha Ala-Rantala, Pekka Abrahamsson

PDF

Open Access

TL;DR

This paper empirically evaluates 20 small language models for code generation, analyzing their correctness, efficiency, and multilingual capabilities, revealing their potential for resource-constrained environments and trade-offs with larger models.

Contribution

It provides a comprehensive benchmark of open-source small language models for code generation, highlighting their strengths, limitations, and performance trade-offs across multiple programming languages.

Findings

01

Several compact SLMs achieve competitive results.

02

Larger models outperform smaller ones but require more computational resources.

03

Performance differences across languages are generally not statistically significant.

Abstract

The recent advancements of Small Language Models (SLMs) have opened new possibilities for efficient code generation. SLMs offer lightweight and cost-effective alternatives to Large Language Models (LLMs), making them attractive for use in resource-constrained environments. However, empirical understanding of SLMs, particularly their capabilities, limitations, and performance trade-offs in code generation remains limited. This study presents a comprehensive empirical evaluation of 20 open-source SLMs ranging from 0.4B to 10B parameters on five diverse code-related benchmarks (HumanEval, MBPP, Mercury, HumanEvalPack, and CodeXGLUE). The models are assessed along three dimensions: i) functional correctness of generated code, ii) computational efficiency and iii) performance across multiple programming languages. The findings of this study reveal that several compact SLMs achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel-Driven Software Engineering Techniques · Software Engineering Research · Natural Language Processing Techniques