Real-PGDN: A Two-level Classification Method for Full-Process Recognition of Newly Registered Pornographic and Gambling Domain Names
Hao Wang, Yingshuo Wang, Junang Gan, Yanan Cheng, Jinshuai Zhang

TL;DR
This paper presents Real-PGDN, a two-level classification system that effectively identifies newly registered pornographic and gambling domain names using real-time data, achieving high accuracy and practical application benefits.
Contribution
The paper introduces a novel two-level classification method integrating BERT-based and traditional algorithms, with comprehensive real-data crawling and feature extraction for improved PGDN detection.
Findings
Achieved 97.88% precision in classifying PGDN
Maintains over 70% forecast precision for delayed domain usage
Constructed the large-scale NRD2024 dataset from 20 days of data
Abstract
Online pornography and gambling have consistently posed regulatory challenges for governments, threatening both personal assets and privacy. Therefore, it is imperative to research the classification of the newly registered Pornographic and Gambling Domain Names (PGDN). However, scholarly investigation into this topic is limited. Previous efforts in PGDN classification pursue high accuracy using ideal sample data, while others employ up-to-date data from real-world scenarios but achieve lower classification accuracy. This paper introduces the Real-PGDN method, which accomplishes a complete process of timely and comprehensive real-data crawling, feature extraction with feature-missing tolerance, precise PGDN classification, and assessment of application effects in actual scenarios. Our two-level classifier, which integrates CoSENT (BERT-based), Multilayer Perceptron (MLP), and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSexuality, Behavior, and Technology · Spam and Phishing Detection · Cybercrime and Law Enforcement Studies
