Hierarchical Evaluation of Software Design Capabilities of Large Language Models of Code
Mootez Saad, Boqi Chen, Jos\'e Antonio Hern\'andez L\'opez, D\'aniel Varr\'o, Tushar Sharma

TL;DR
This study systematically evaluates large language models' understanding of software design concepts, revealing strengths in cohesion analysis but fragility in coupling reasoning under noisy, open-ended conditions.
Contribution
It provides a hierarchical evaluation framework for LLMs' software design capabilities, highlighting their robustness in cohesion analysis and vulnerability in coupling reasoning.
Findings
Models understand cohesion well in guided scenarios
Coupling reasoning is highly sensitive to noise and lack of guidance
Performance drops significantly in open-ended, noisy conditions
Abstract
Large language models (LLMs) are being increasingly adopted in the software engineering domain, yet the robustness of their grasp on core software design concepts remains unclear. We conduct an empirical study to systematically evaluate their understanding of cohesion (intra-module) and coupling (inter-module). We programmatically generate poorly designed code fragments and test the DeepSeek-R1 model family (B, B, B) under varying levels of guidance, from simple \textit{Verification} to \textit{Guided} and \textit{Open-ended Generation}, while varying contextual noise by injecting distractor elements. While models exhibit a solid baseline understanding of both concepts in ideal conditions, their practical knowledge is fragile and highly asymmetrical. Reasoning about coupling proves brittle; performance collapses in noisy, open-ended scenarios, with F1 scores dropping by over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Engineering Techniques and Practices
