Combining Static Code Analysis and Large Language Models Improves Correctness and Performance of Algorithm Recognition
Denis Neum\"uller, Sebastian Boll, David Sch\"uler, Matthias Tichy

TL;DR
Combining static code analysis with large language models significantly enhances algorithm recognition accuracy and efficiency, reducing LLM calls by over 70% and improving F1-scores by up to 12 percentage points.
Contribution
This paper introduces a hybrid approach that combines static analysis with LLMs, reducing runtime and improving classification performance in algorithm recognition tasks.
Findings
Combined approach reduces LLM calls by up to 97.5%
F1-scores improve by up to 12 percentage points
Effective even with obfuscated identifiers
Abstract
Context: Since it is well-established that developers spend a substantial portion of their time understanding source code, the ability to automatically identify algorithms within source code presents a valuable opportunity. This capability can support program comprehension, facilitate maintenance, and enhance overall software quality. Objective: We empirically evaluate how combining LLMs with static code analysis can improve the automated recognition of algorithms, while also evaluating their standalone performance and dependence on identifier names. Method: We perform multiple experiments evaluating the combination of LLMs with static analysis using different filter patterns. We compare this combined approach against their standalone performance under various prompting strategies and investigate the impact of systematic identifier obfuscation on classification performance and runtime.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
