A Hybrid Deep Learning and Anomaly Detection Framework for Real-Time Malicious URL Classification
Berkani Khaled, Zeraoulia Rafik

TL;DR
This paper introduces a hybrid deep learning framework that combines feature extraction, anomaly detection, and neural classification to achieve fast, accurate, and scalable real-time malicious URL detection with multilingual support.
Contribution
It presents a novel multi-stage pipeline integrating hashing, SMOTE, isolation forest, and neural networks for efficient real-time URL classification, outperforming traditional models in speed and accuracy.
Findings
Achieves 96.4% accuracy and 95.4% F1-score.
Provides a 20 ms prediction latency.
Outperforms CNN and SVM baselines in speed and accuracy.
Abstract
Malicious URLs remain a primary vector for phishing, malware, and cyberthreats. This study proposes a hybrid deep learning framework combining \texttt{HashingVectorizer} n-gram analysis, SMOTE balancing, Isolation Forest anomaly filtering, and a lightweight neural network classifier for real-time URL classification. The multi-stage pipeline processes URLs from open-source repositories with statistical features (length, dot count, entropy), achieving training complexity and a 20\,ms prediction latency. Empirical evaluation yields 96.4\% accuracy, 95.4\% F1-score, and 97.3\% ROC-AUC, outperforming CNN (94.8\%) and SVM baselines with a -- speedup (Table~\ref{tab:comp-complexity}). A multilingual Tkinter GUI (Arabic/English/French) enables real-time threat assessment with clipboard integration. The framework demonstrates superior scalability and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Cybercrime and Law Enforcement Studies · Misinformation and Its Impacts
