An Empirical Study on Capability of Large Language Models in Understanding Code Semantics
Thu-Trang Nguyen, Thanh Trong Vu, Hieu Dinh Vo, Son Nguyen

TL;DR
This study systematically evaluates large language models' ability to understand code semantics by testing their robustness and sensitivity to code transformations across multiple software engineering tasks.
Contribution
Introduces EMPICA, a framework for empirically assessing code LLMs' semantic understanding through controlled code modifications.
Findings
Models are more robust to semantic-preserving transformations.
Sensitivity to non-semantic-preserving transformations varies across tasks.
Significant gaps remain in models' understanding of code semantics.
Abstract
Large Language Models for Code (code LLMs) have demonstrated remarkable performance across various software engineering (SE) tasks, increasing the application of code LLMs in software development. Despite the success of code LLMs, there remain significant concerns about the actual capabilities and reliability of these models, "whether these models really learn the semantics of code from the training data and leverage the learned knowledge to perform the SE tasks". In this paper, we introduce EMPICA, a comprehensive framework designed to systematically and empirically evaluate the capabilities of code LLMs in understanding code semantics. Specifically, EMPICA systematically introduces controlled modifications/transformations into the input code and examines the models' responses. Generally, code LLMs must be robust to semantically equivalent code inputs and be sensitive to non-equivalent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Semantic Web and Ontologies · Natural Language Processing Techniques
