# Cross-Domain Malware Detection via Probability-Level Fusion of Lightweight Gradient Boosting Models

**Authors:** Omar Khalid Ali Mohamed

arXiv: 2509.00476 · 2025-09-03

## TL;DR

This paper introduces a lightweight, probability-level fusion framework using gradient boosting models trained on multiple datasets to improve cross-domain malware detection accuracy and efficiency.

## Contribution

It proposes a novel fusion approach that combines predictions from models trained on diverse datasets, enhancing cross-domain generalization with low computational costs.

## Key findings

- Fusion model achieves macro F1-score of 0.823
- Outperforms individual models in cross-domain detection
- Maintains low computational overhead

## Abstract

The escalating sophistication of malware necessitates robust detection mechanisms that generalize across diverse data sources. Traditional single-dataset models struggle with cross-domain generalization and often incur high computational costs. This paper presents a novel, lightweight framework for malware detection that employs probability-level fusion across three distinct datasets: EMBER (static features), API Call Sequences (behavioral features), and CIC Obfuscated Memory (memory patterns). Our method trains individual LightGBM classifiers on each dataset, selects top predictive features to ensure efficiency, and fuses their prediction probabilities using optimized weights determined via grid search. Extensive experiments demonstrate that our fusion approach achieves a macro F1-score of 0.823 on a cross-domain validation set, significantly outperforming individual models and providing superior generalization. The framework maintains low computational overhead, making it suitable for real-time deployment, and all code and data are provided for full reproducibility.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00476/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00476/full.md

## References

10 references — full list in the complete paper: https://tomesphere.com/paper/2509.00476/full.md

---
Source: https://tomesphere.com/paper/2509.00476