Phishing Detection System: An Ensemble Approach Using Character-Level CNN and Feature Engineering

Rudra Dubey; Arpit Mani Tripathi; Archit Srivastava; Sarvpal Singh

arXiv:2512.16717·cs.LG·December 19, 2025

Phishing Detection System: An Ensemble Approach Using Character-Level CNN and Feature Engineering

Rudra Dubey, Arpit Mani Tripathi, Archit Srivastava, Sarvpal Singh

PDF

Open Access

TL;DR

This paper introduces an ensemble AI system combining character-level CNN and feature engineering for highly accurate, real-time phishing URL detection, outperforming individual models and maintaining low false positives.

Contribution

The paper presents a novel ensemble approach integrating character-level CNN and LightGBM with engineered features for improved phishing detection accuracy.

Findings

01

Achieved 99.819% accuracy on test URLs

02

Ensemble outperforms individual models in detection

03

System operates in real-time with low false positives

Abstract

In actuality, phishing attacks remain one of the most prevalent cybersecurity risks in existence today, with malevolent actors constantly changing their strategies to successfully trick users. This paper presents an AI model for a phishing detection system that uses an ensemble approach to combine character-level Convolutional Neural Networks (CNN) and LightGBM with engineered features. Our system uses a character-level CNN to extract sequential features after extracting 36 lexical, structural, and domain-based features from the URLs. On a test dataset of 19,873 URLs, the ensemble model achieves an accuracy of 99.819 percent, precision of 100 percent, recall of 99.635 percent, and ROC-AUC of 99.947 percent. Through a FastAPI-based service with an intuitive user interface, the suggested system has been utilised to offer real-time detection. In contrast, the results demonstrate that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Cybercrime and Law Enforcement Studies · Misinformation and Its Impacts