Beyond Detection: A Comprehensive Benchmark and Study on Representation Learning for Fine-Grained Webshell Family Classification

Feijiang Han

arXiv:2512.05288·cs.CR·December 8, 2025

Beyond Detection: A Comprehensive Benchmark and Study on Representation Learning for Fine-Grained Webshell Family Classification

Feijiang Han

PDF

Open Access

TL;DR

This paper introduces a comprehensive benchmark and study on representation learning techniques for automated classification of WebShell malware families, moving beyond detection to understanding malware lineage.

Contribution

It presents the first systematic approach to automate WebShell family classification using dynamic behavior extraction, dataset augmentation with LLMs, and extensive benchmarking of various representation methods.

Findings

01

Sequence and graph-based models outperform traditional methods.

02

Augmented datasets improve classification robustness.

03

Structure-aware algorithms show promising results.

Abstract

Malicious WebShells pose a significant and evolving threat by compromising critical digital infrastructures and endangering public services in sectors such as healthcare and finance. While the research community has made significant progress in WebShell detection (i.e., distinguishing malicious samples from benign ones), we argue that it is time to transition from passive detection to in-depth analysis and proactive defense. One promising direction is the automation of WebShell family classification, which involves identifying the specific malware lineage in order to understand an adversary's tactics and enable a precise, rapid response. This crucial task, however, remains a largely unexplored area that currently relies on slow, manual expert analysis. To address this gap, we present the first systematic study to automate WebShell family classification. Our method begins with extracting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Advanced Malware Detection Techniques · Web Application Security Vulnerabilities