Automated Classification of Cybercrime Complaints using Transformer-based Language Models for Hinglish Texts
Nanda Rani, Divyanshu Singh, Bikash Saha, Sandeep Kumar Shukla

TL;DR
This paper presents a transformer-based framework for automatically classifying multilingual, code-mixed cybercrime complaints in Hinglish, improving accuracy and scalability for real-world law enforcement applications.
Contribution
It introduces Hinglish-adapted transformers and a data augmentation approach to effectively classify cybercrime complaints, addressing language and privacy challenges.
Findings
HingRoBERTa achieved 74.41% accuracy
The framework effectively handles code-mixed Hinglish texts
The solution is scalable and deployable in real-world platforms
Abstract
The rise in cybercrime and the complexity of multilingual and code-mixed complaints present significant challenges for law enforcement and cybersecurity agencies. These organizations need automated, scalable methods to identify crime types, enabling efficient processing and prioritization of large complaint volumes. Manual triaging is inefficient, and traditional machine learning methods fail to capture the semantic and contextual nuances of textual cybercrime complaints. Moreover, the lack of publicly available datasets and privacy concerns hinder the research to present robust solutions. To address these challenges, we propose a framework for automated cybercrime complaint classification. The framework leverages Hinglish-adapted transformers, such as HingBERT and HingRoBERTa, to handle code-mixed inputs effectively. We employ the real-world dataset provided by Indian Cybercrime…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCybercrime and Law Enforcement Studies · Hate Speech and Cyberbullying Detection · Authorship Attribution and Profiling
