Exploring the Reasoning Depth of Small Language Models in Software Architecture: A Multidimensional Evaluation Framework Towards Software Engineering 2.0

Ha Vo; Nhut Tran; Khang Vo; Phat T. Tran-Truong; Son Ha

arXiv:2603.07091·cs.SE·March 10, 2026

Exploring the Reasoning Depth of Small Language Models in Software Architecture: A Multidimensional Evaluation Framework Towards Software Engineering 2.0

Ha Vo, Nhut Tran, Khang Vo, Phat T. Tran-Truong, Son Ha

PDF

Open Access

TL;DR

This paper benchmarks small language models for software architecture reasoning, revealing their capabilities, limitations, and the effectiveness of various prompting and fine-tuning strategies within a new evaluation framework.

Contribution

It introduces a multidimensional evaluation framework for small language models in software architecture and provides empirical insights into their reasoning abilities and limitations.

Findings

01

Models above 3B parameters show strong zero-shot reasoning.

02

Fine-Tuning improves BERTScore more in sub-2B models.

03

Few-Shot prompting effectively calibrates mid-sized models.

Abstract

In the era of "Software Engineering 2.0" (SE 2.0), where intelligent agents collaborate with human engineers, Generative AI is advancing beyond code generation into Software Architecture (SA). While Large Language Models (LLMs) demonstrate superior capabilities, computational costs and data privacy concerns drive interest in Small Language Models (SLMs) with fewer than 7 billion parameters. However, the reasoning limits of these resource-constrained models remain unexplored. This study benchmarks 10 state-of-the-art SLMs on Architectural Decision Records generation, introducing a multi-dimensional framework evaluating Technical Compliance and Semantic Diversity. Our empirical results reveal a significant reasoning gap: models above the 3B-parameter threshold demonstrate robust zero-shot capabilities, while sub-2B models show the strongest BERTScore gains from Fine-Tuning, though…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Ethics and Social Impacts of AI