Efficient Code Analysis via Graph-Guided Large Language Models

Hang Gao; Tao Peng; Baoquan Cui; Hong Huang; Fengge Wu; Junsuo Zhao; Jian Zhang

arXiv:2601.12890·cs.SE·January 23, 2026

Efficient Code Analysis via Graph-Guided Large Language Models

Hang Gao, Tao Peng, Baoquan Cui, Hong Huang, Fengge Wu, Junsuo Zhao, Jian Zhang

PDF

Open Access

TL;DR

This paper introduces a graph-guided approach that combines LLMs and GNNs to improve detection of malicious code across files, enhancing accuracy while reducing annotation effort.

Contribution

It presents a novel graph-centric pipeline that guides LLMs with GNN-based insights for more effective malicious code detection across project files.

Findings

01

Outperforms existing methods on multiple datasets

02

Reduces irrelevant context interference in code analysis

03

Maintains low annotation costs for training

Abstract

Large Language Models (LLMs) have significantly advanced code analysis tasks, yet they struggle to detect malicious behaviors fragmented across files, whose intricate dependencies easily get lost in the vast amount of benign code. We therefore propose a graph-centric attention acquisition pipeline that enhances LLMs' ability to localize malicious behavior. The approach parses a project into a code graph, uses an LLM to encode nodes with semantic and structural signals, and trains a Graph Neural Network (GNN) under sparse supervision. The GNN performs an initial detection, and by interpreting these predictions, identifies key code sections that are most likely to contain malicious behavior. These influential regions are then used to guide the LLM's attention for in-depth analysis. This strategy significantly reduces interference from irrelevant context while maintaining low annotation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Testing and Debugging Techniques