An Empirical Analysis of Static Analysis Methods for Detection and Mitigation of Code Library Hallucinations
Clarissa Miranda-Pena, Andrew Reeson, C\'ecile Paris, Josiah Poon, Jonathan K. Kummerfeld

TL;DR
This paper empirically evaluates static analysis tools for detecting and mitigating library hallucinations in code generated by large language models, revealing their strengths and limitations.
Contribution
It provides a comprehensive analysis of static analysis effectiveness in identifying library hallucinations, highlighting their potential and upper bounds.
Findings
Static analysis detects 16-70% of errors and 14-85% of hallucinations.
Static methods cannot catch all hallucinations, with an upper bound of 48.5-77%.
Static analysis is a cost-effective approach for some hallucination mitigation.
Abstract
Despite extensive research, Large Language Models continue to hallucinate when generating code, particularly when using libraries. On NL-to-code benchmarks that require library use, we find that LLMs generate code that uses non-existent library features in 8.1-40% of responses. One intuitive approach for detection and mitigation of hallucinations is static analysis. In this paper, we analyse the potential of static analysis tools, both in terms of what they can solve and what they cannot. We find that static analysis tools can detect 16-70% of all errors, and 14-85% of library hallucinations, with performance varying by LLM and dataset. Through manual analysis, we identify cases a static method could not plausibly catch, which gives an upper bound on their potential from 48.5% to 77%. Overall, we show that static analysis methods are cheap method for addressing some forms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
