# Integrating NLP and Ensemble Learning into Next-Generation Firewalls for Robust Malware Detection in Edge Computing

**Authors:** Ramahlapane Lerato Moila, Mthulisi Velempini

PMC · DOI: 10.3390/s26020424 · Sensors (Basel, Switzerland) · 2026-01-09

## TL;DR

A new firewall model using NLP and machine learning improves malware detection in edge computing with high accuracy.

## Contribution

A novel NLP–ensemble model integrated into next-generation firewalls for robust malware detection in edge environments.

## Key findings

- The model achieves 95% accuracy on a cyber threat dataset and 98% on the CSE-CIC-IDS2018 dataset.
- Synthetic data generation effectively addresses class imbalance, improving detection of malicious traffic.

## Abstract

This study proposes a novel NLP–ensemble model for next-generation firewalls that achieves over 95% accuracy in detecting malware within edge computing environments. By integrating synthetic data generation to address class imbalance effectively, the model significantly improves the detection of malicious network traffic, providing a scalable and intelligent defense layer for resource-constrained systems.

What are the main findings?
Proposed NLP–ensemble model achieves 95% and 98% accuracy on cyber threat and CSE-CIC-IDS2018 datasets, respectively.Synthetic data generation effectively mitigates class imbalance, drastically improving the minority class (malicious traffic) detection.

Proposed NLP–ensemble model achieves 95% and 98% accuracy on cyber threat and CSE-CIC-IDS2018 datasets, respectively.

Synthetic data generation effectively mitigates class imbalance, drastically improving the minority class (malicious traffic) detection.

What is the implication of the main finding?
Provides a scalable, intelligent defense layer optimized for resource-constrained edge environments.Demonstrates a practical pathway for integrating AI-driven, language-based threat recognition into existing NGFW architectures.

Provides a scalable, intelligent defense layer optimized for resource-constrained edge environments.

Demonstrates a practical pathway for integrating AI-driven, language-based threat recognition into existing NGFW architectures.

As edge computing becomes increasingly central to modern digital infrastructure, it also creates opportunities for sophisticated malware attacks that traditional security systems struggle to address. This study proposes a natural language processing (NLP) framework integrated with ensemble learning into next-generation firewalls (NGFWs) to detect and mitigate malware attacks in edge computing environments. The approach leverages unstructured threat intelligence (e.g., cybersecurity reports, logs) by applying NLP techniques, such as TF-IDF vectorization, to convert textual data into structured insights. This process uncovers hidden patterns and entity relationships within system logs. By combining Random Forest (RF) and Logistic Regression (LR) in a soft voting ensemble, the proposed model achieves 95% accuracy on a cyber threat intelligence dataset augmented with synthetic data to address class imbalance, and 98% accuracy on the CSE-CIC-IDS2018 dataset. The study was validated using ANOVA to assess statistical robustness and confusion matrix analysis, both of which confirmed low error rates. The system enhances detection rates and adaptability, providing a scalable defense layer optimized for resource-constrained, latency-sensitive edge environments.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12845531/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12845531/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/PMC12845531/full.md

---
Source: https://tomesphere.com/paper/PMC12845531