LAMDA: A Longitudinal Android Malware Benchmark for Concept Drift Analysis
Md Ahsanul Haque, Ismail Hossain, Md Mahmuduzzaman Kamol, Md Jahangir Alam, Suresh Kumar Amalapuram, Sajedul Talukder, Mohammad Saidur Rahman

TL;DR
LAMDA is a comprehensive, long-term Android malware dataset designed to analyze concept drift, enabling researchers to evaluate how malware detection models' performance degrades over time due to evolving malware and benign applications.
Contribution
This paper introduces LAMDA, the largest and most temporally diverse Android malware benchmark specifically created for concept drift analysis in malware detection.
Findings
Standard ML models' performance degrades over time on LAMDA
Feature stability varies across different malware families
LAMDA enables detailed study of malware evolution and detection challenges
Abstract
Machine learning (ML)-based malware detection systems often fail to account for the dynamic nature of real-world training and test data distributions. In practice, these distributions evolve due to frequent changes in the Android ecosystem, adversarial development of new malware families, and the continuous emergence of both benign and malicious applications. Prior studies have shown that such concept drift -- distributional shifts in benign and malicious samples, leads to significant degradation in detection performance over time. Despite the practical importance of this issue, existing datasets are often outdated and limited in temporal scope, diversity of malware families, and sample scale, making them insufficient for the systematic evaluation of concept drift in malware detection. To address this gap, we present LAMDA, the largest and most temporally diverse Android malware…
Peer Reviews
Decision·ICLR 2026 Poster
The paper is well-organized. The research topic is significant. The experiments are sufficient.
It lacks a clear comparison with relevant datasets. It lacks specific guidance for future work.
This work is in an area that is now not near my current area of research, thus my lower confidence score. The dataset is large and to the best of my knowledge the longest longitudinal malware dataset collected to date. The analysis is very thorough.
See questions
Scale and Temporal Scope: Over 1 million samples across 12 years with 1,380 families and 150K singleton samples provide unprecedented temporal coverage and diversity for Android malware research. This addresses a genuine gap in existing datasets. Comprehensive Drift Analysis: The multi-faceted approach (supervised learning degradation, feature distribution shifts via Jeffreys divergence, feature stability scores, SHAP-based explanation drift, label drift) provides rich evidence for concept
Unclear Scan Consistency: The paper does not specify whether VirusTotal labels were obtained from single-pass or repeated scans. Since detection outcomes can vary across rescans, this ambiguity may introduce label inconsistency. Lack of Intra-Sample Drift Analysis: The study analyzes global and family-level drift but does not consider intra-sample temporal variation—how the same APK’s features might change across time. Such analysis could better capture longitudinal behavior shifts. Static
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Caching and Content Delivery · Peer-to-Peer Network Technologies
