Evaluating and Explaining Large Language Models for Code Using Syntactic   Structures

David N Palacio; Alejandro Velasco; Daniel Rodriguez-Cardenas; Kevin; Moran; Denys Poshyvanyk

arXiv:2308.03873·cs.SE·August 9, 2023·1 cites

Evaluating and Explaining Large Language Models for Code Using Syntactic Structures

David N Palacio, Alejandro Velasco, Daniel Rodriguez-Cardenas, Kevin, Moran, Denys Poshyvanyk

PDF

Open Access

TL;DR

This paper introduces ASTxplainer, a novel explainability method for code-focused large language models, enabling better evaluation and visualization of model predictions by aligning them with syntactic structures, specifically AST nodes.

Contribution

The paper presents ASTxplainer, an automated approach to align LLM token predictions with AST nodes, facilitating improved evaluation and interpretability of code models.

Findings

01

ASTxplainer effectively aligns token predictions with AST structures.

02

Empirical evaluation on 12 popular LLMs demonstrates its practical utility.

03

User study shows ASTxplainer visualizations aid end-user understanding.

Abstract

Large Language Models (LLMs) for code are a family of high-parameter, transformer-based neural networks pre-trained on massive datasets of both natural and programming languages. These models are rapidly being employed in commercial AI-based developer tools, such as GitHub CoPilot. However, measuring and explaining their effectiveness on programming tasks is a challenging proposition, given their size and complexity. The methods for evaluating and explaining LLMs for code are inextricably linked. That is, in order to explain a model's predictions, they must be reliably mapped to fine-grained, understandable concepts. Once this mapping is achieved, new methods for detailed model evaluations are possible. However, most current explainability techniques and evaluation benchmarks focus on model robustness or individual task performance, as opposed to interpreting model predictions. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Software Engineering Research · Ferroelectric and Negative Capacitance Devices

MethodsFocus