Exploring Large Language Models for Semantic Analysis and Categorization   of Android Malware

Brandon J Walton; Mst Eshita Khatun; James M Ghawaly; Aisha Ali-Gombe

arXiv:2501.04848·cs.CR·January 10, 2025

Exploring Large Language Models for Semantic Analysis and Categorization of Android Malware

Brandon J Walton, Mst Eshita Khatun, James M Ghawaly, Aisha Ali-Gombe

PDF

Open Access

TL;DR

This paper investigates using GPT-4o-mini-based LLMs with strategic prompt engineering to enhance Android malware analysis, enabling faster identification, categorization, and pinpointing malicious code snippets without fine-tuning.

Contribution

It introduces sp, a novel LLM-based framework that improves malware categorization and summarization for Android apps through hierarchical prompts and backward tracing techniques.

Findings

01

Achieves up to 77% classification accuracy without fine-tuning.

02

Provides detailed summaries at multiple levels of malware analysis.

03

Enables pinpointing of malicious code snippets via backward tracing.

Abstract

Malware analysis is a complex process of examining and evaluating malicious software's functionality, origin, and potential impact. This arduous process typically involves dissecting the software to understand its components, infection vector, propagation mechanism, and payload. Over the years, deep reverse engineering of malware has become increasingly tedious, mainly due to modern malicious codebases' fast evolution and sophistication. Essentially, analysts are tasked with identifying the elusive needle in the haystack within the complexities of zero-day malware, all while under tight time constraints. Thus, in this paper, we explore leveraging Large Language Models (LLMs) for semantic malware analysis to expedite the analysis of known and novel samples. Built on GPT-4o-mini model, \msp is designed to augment malware analysis for Android through a hierarchical-tiered summarization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Cybercrime and Law Enforcement Studies · Network Security and Intrusion Detection