On Mitigating Code LLM Hallucinations with API Documentation
Nihal Jain, Robert Kwiatkowski, Baishakhi Ray, Murali Krishna, Ramanathan, Varun Kumar

TL;DR
This paper investigates API hallucinations in code LLMs, introduces CloudAPIBench benchmark, and proposes methods to improve API invocation accuracy, especially for low frequency APIs, by leveraging documentation and confidence scores.
Contribution
We introduce CloudAPIBench benchmark for measuring API hallucinations and propose a dynamic approach to trigger documentation augmentation based on confidence scores.
Findings
Code LLMs perform poorly on low frequency APIs (38.58% validity).
Documentation Augmented Generation improves low frequency API performance to 47.94%.
Proposed methods increase overall API invocation reliability by 8.20%.
Abstract
In this study, we address the issue of API hallucinations in various software engineering contexts. We introduce CloudAPIBench, a new benchmark designed to measure API hallucination occurrences. CloudAPIBench also provides annotations for frequencies of API occurrences in the public domain, allowing us to study API hallucinations at various frequency levels. Our findings reveal that Code LLMs struggle with low frequency APIs: for e.g., GPT-4o achieves only 38.58% valid low frequency API invocations. We demonstrate that Documentation Augmented Generation (DAG) significantly improves performance for low frequency APIs (increase to 47.94% with DAG) but negatively impacts high frequency APIs when using sub-optimal retrievers (a 39.02% absolute drop). To mitigate this, we propose to intelligently trigger DAG where we check against an API index or leverage Code LLMs' confidence scores to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Advanced Malware Detection Techniques · Electrostatic Discharge in Electronics
