Breaking Obfuscation: Cluster-Aware Graph with LLM-Aided Recovery for Malicious JavaScript Detection

Zhihong Liang; Xin Wang; Zhenhuang Hu; Liangliang Song; Lin Chen; Jingjing Guo; Yanbin Wang; Ye Tian

arXiv:2507.22447·cs.CR·July 31, 2025

Breaking Obfuscation: Cluster-Aware Graph with LLM-Aided Recovery for Malicious JavaScript Detection

Zhihong Liang, Xin Wang, Zhenhuang Hu, Liangliang Song, Lin Chen, Jingjing Guo, Yanbin Wang, Ye Tian

PDF

TL;DR

This paper introduces DeCoda, a hybrid framework combining LLM-based deobfuscation and hierarchical graph learning to improve detection of malicious, obfuscated JavaScript code with high accuracy and low false positives.

Contribution

It presents a novel multi-stage LLM deobfuscation pipeline and a cluster-aware graph learning method that effectively captures code semantics and structure for malware detection.

Findings

01

Achieves F1-scores of 94.64% and 97.71% on benchmark datasets.

02

Outperforms state-of-the-art baselines by over 10% in F1-score.

03

Significantly improves true positive rates at low false positive levels.

Abstract

With the rapid expansion of web-based applications and cloud services, malicious JavaScript code continues to pose significant threats to user privacy, system integrity, and enterprise security. But, detecting such threats remains challenging due to sophisticated code obfuscation techniques and JavaScript's inherent language characteristics, particularly its nested closure structures and syntactic flexibility. In this work, we propose DeCoda, a hybrid defense framework that combines large language model (LLM)-based deobfuscation with code graph learning: (1) We first construct a sophisticated prompt-learning pipeline with multi-stage refinement, where the LLM progressively reconstructs the original code structure from obfuscated inputs and then generates normalized Abstract Syntax Tree (AST) representations; (2) In JavaScript ASTs, dynamic typing scatters semantically similar nodes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.