XGen-Q: An Explainable Domain-Adaptive LLM Framework with Retrieval-Augmented Generation for Software Security

Hamed Jelodar; Mohammad Meymani; Roozbeh Razavi-Far; Ali A. Ghorbani

arXiv:2510.19006·cs.IR·October 23, 2025

XGen-Q: An Explainable Domain-Adaptive LLM Framework with Retrieval-Augmented Generation for Software Security

Hamed Jelodar, Mohammad Meymani, Roozbeh Razavi-Far, Ali A. Ghorbani

PDF

Open Access 3 Models

TL;DR

XGen-Q is an explainable, domain-adapted large language model designed for malware detection, leveraging retrieval-augmented generation and extensive training on obfuscated malware to improve generalization and interpretability in cybersecurity.

Contribution

The paper introduces XGen-Q, a novel LLM framework pretrained on malware data, employing multi-stage prompting and retrieval-augmented generation for robust, explainable malware analysis.

Findings

01

Achieves lower perplexity than baselines

02

Performs well on unseen malware samples

03

Provides detailed forensic reports

Abstract

Generative AI and large language models (LLMs) have shown strong capabilities in code understanding, but their use in cybersecurity, particularly for malware detection and analysis, remains limited. Existing detection systems often fail to generalize to obfuscated or previously unseen threats, underscoring the need for more adaptable and explainable models. To address this challenge, we introduce XGen-Q, a domain-adapted LLM built on the Qwen-Coder architecture and pretrained on a large-scale corpus of over one million malware samples, spanning both source and assembly code. XGen-Q uses a multi-stage prompt strategy combined with retrieval-augmented generation (RAG) to deliver reliable malware identification and detailed forensic reporting, even in the presence of complex code obfuscation. To further enhance generalization, we design a training pipeline that systematically exposes the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Digital and Cyber Forensics