Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations
David N. Palacio, Daniel Rodriguez-Cardenas, Alejandro Velasco, Dipin Khati, Kevin Moran, Denys Poshyvanyk

TL;DR
This paper introduces ASTrust, a novel interpretability method for code-focused LLMs that grounds explanations in programming syntax, enhancing trustworthiness and understanding of model predictions.
Contribution
A new syntax-grounded interpretability technique for LLMs on code, linking model confidence to Abstract Syntax Tree structures for better transparency.
Findings
A visualization tool for model confidence on syntactic structures.
Improved interpretability demonstrated on 12 popular LLMs.
Positive feedback from human study on ASTrust's usefulness.
Abstract
Trustworthiness and interpretability are inextricably linked concepts for LLMs. The more interpretable an LLM is, the more trustworthy it becomes. However, current techniques for interpreting LLMs when applied to code-related tasks largely focus on accuracy measurements, measures of how models react to change, or individual task performance instead of the fine-grained explanations needed at prediction time for greater interpretability, and hence trust. To improve upon this status quo, this paper introduces ASTrust, an interpretability method for LLMs of code that generates explanations grounded in the relationship between model confidence and syntactic structures of programming languages. ASTrust explains generated code in the context of syntax categories based on Abstract Syntax Trees and aids practitioners in understanding model predictions at both local (individual code snippets) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAccess Control and Trust
MethodsSparse Evolutionary Training · Focus
