MALSIGHT: Exploring Malicious Source Code and Benign Pseudocode for Iterative Binary Malware Summarization
Haolang Lu, Hongrui Peng, Guoshun Nan, Jiaoyang Cui, Cheng Wang, Weifei Jin, Songtao Wang, Shengli Pan, Xiaofeng Tao

TL;DR
MALSIGHT introduces an iterative malware summarization framework that leverages malicious source code and benign pseudocode, utilizing a novel dataset and a specialized LLM to improve the accuracy and completeness of binary malware descriptions.
Contribution
The paper presents the first malware summary dataset, MalS and MalP, and a new LLM-based model MalT5, enhancing malware summarization through iterative pseudocode exploration.
Findings
MalT5 achieves comparable performance to larger models with fewer parameters.
Iterative pseudocode feeding improves summary quality and understanding.
The BLEURT-sum benchmark effectively measures summary quality.
Abstract
Binary malware summarization aims to automatically generate human-readable descriptions of malware behaviors from executable files, facilitating tasks like malware cracking and detection. Previous methods based on Large Language Models (LLMs) have shown great promise. However, they still face significant issues, including poor usability, inaccurate explanations,and incomplete summaries, primarily due to the obscure pseudocode structure and the lack of malware training summaries. Further, calling relationships between functions, which involve the rich interactions within a binary malware, remain largely underexplored. To this end, we propose MALSIGHT, a novel code summarization framework that can iteratively generate descriptions of binary malware by exploring malicious source code and benign pseudocode. Specifically, we construct the first malware summary dataset, MalS and MalP, using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Information and Cyber Security
