Short Text Classification Approach to Identify Child Sexual Exploitation Material
Mhd Wesam Al-Nabki, Eduardo Fidalgo, Enrique Alegre, Roc\'io, Alaiz-Rodr\'iguez

TL;DR
This paper introduces and compares two short text classification methods to efficiently identify Child Sexual Exploitation Material files by analyzing file names and paths, aiding law enforcement in faster evidence detection.
Contribution
It proposes novel approaches using character n-grams and logistic regression to classify CSEM files based on obfuscated short text, improving speed and accuracy over manual inspection.
Findings
Achieved an average class recall of 0.98.
Compared two approaches: separate classifiers for file name and path, and a single classifier for file name and path.
Potential integration into forensic tools to support law enforcement efforts.
Abstract
Producing or sharing Child Sexual Exploitation Material (CSEM) is a serious crime fought vigorously by Law Enforcement Agencies (LEAs). When an LEA seizes a computer from a potential producer or consumer of CSEM, they need to analyze the suspect's hard disk's files looking for pieces of evidence. However, a manual inspection of the file content looking for CSEM is a time-consuming task. In most cases, it is unfeasible in the amount of time available for the Spanish police using a search warrant. Instead of analyzing its content, another approach that can be used to speed up the process is to identify CSEM by analyzing the file names and their absolute paths. The main challenge for this task lies behind dealing with short text distorted deliberately by the owners of this material using obfuscated words and user-defined naming patterns. This paper presents and compares two approaches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLogistic Regression
