Efficient Code Analysis via Graph-Guided Large Language Models
Hang Gao, Tao Peng, Baoquan Cui, Hong Huang, Fengge Wu, Junsuo Zhao, Jian Zhang

TL;DR
This paper introduces a graph-guided approach that combines LLMs and GNNs to improve detection of malicious code across files, enhancing accuracy while reducing annotation effort.
Contribution
It presents a novel graph-centric pipeline that guides LLMs with GNN-based insights for more effective malicious code detection across project files.
Findings
Outperforms existing methods on multiple datasets
Reduces irrelevant context interference in code analysis
Maintains low annotation costs for training
Abstract
Large Language Models (LLMs) have significantly advanced code analysis tasks, yet they struggle to detect malicious behaviors fragmented across files, whose intricate dependencies easily get lost in the vast amount of benign code. We therefore propose a graph-centric attention acquisition pipeline that enhances LLMs' ability to localize malicious behavior. The approach parses a project into a code graph, uses an LLM to encode nodes with semantic and structural signals, and trains a Graph Neural Network (GNN) under sparse supervision. The GNN performs an initial detection, and by interpreting these predictions, identifies key code sections that are most likely to contain malicious behavior. These influential regions are then used to guide the LLM's attention for in-depth analysis. This strategy significantly reduces interference from irrelevant context while maintaining low annotation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Testing and Debugging Techniques
