Enhancing web traffic attacks identification through ensemble methods and feature selection
Daniel Urda, Branly Mart\'inez, Nu\~no Basurto, Meelis Kull, \'Angel, Arroyo, \'Alvaro Herrero

TL;DR
This paper demonstrates that ensemble machine learning methods combined with feature selection significantly improve the accuracy of web traffic attack detection, achieving near-perfect AUC scores on simulated e-commerce data.
Contribution
It introduces a novel framework utilizing ensemble techniques and feature selection to enhance web attack identification accuracy over baseline classifiers.
Findings
Ensemble methods outperform baseline classifiers by ~20% in accuracy.
Achieved an AUC of 0.989 with ensemble models.
Feature selection improves model robustness.
Abstract
Websites, as essential digital assets, are highly vulnerable to cyberattacks because of their high traffic volume and the significant impact of breaches. This study aims to enhance the identification of web traffic attacks by leveraging machine learning techniques. A methodology was proposed to extract relevant features from HTTP traces using the CSIC2010 v2 dataset, which simulates e-commerce web traffic. Ensemble methods, such as Random Forest and Extreme Gradient Boosting, were employed and compared against baseline classifiers, including k-nearest Neighbor, LASSO, and Support Vector Machines. The results demonstrate that the ensemble methods outperform baseline classifiers by approximately 20% in predictive accuracy, achieving an Area Under the ROC Curve (AUC) of 0.989. Feature selection methods such as Information Gain, LASSO, and Random Forest further enhance the robustness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Internet Traffic Analysis and Secure E-voting · Network Packet Processing and Optimization
MethodsFeature Selection
