Automated Classification of Cybercrime Complaints using   Transformer-based Language Models for Hinglish Texts

Nanda Rani; Divyanshu Singh; Bikash Saha; Sandeep Kumar Shukla

arXiv:2412.16614·cs.CR·December 24, 2024

Automated Classification of Cybercrime Complaints using Transformer-based Language Models for Hinglish Texts

Nanda Rani, Divyanshu Singh, Bikash Saha, Sandeep Kumar Shukla

PDF

Open Access

TL;DR

This paper presents a transformer-based framework for automatically classifying multilingual, code-mixed cybercrime complaints in Hinglish, improving accuracy and scalability for real-world law enforcement applications.

Contribution

It introduces Hinglish-adapted transformers and a data augmentation approach to effectively classify cybercrime complaints, addressing language and privacy challenges.

Findings

01

HingRoBERTa achieved 74.41% accuracy

02

The framework effectively handles code-mixed Hinglish texts

03

The solution is scalable and deployable in real-world platforms

Abstract

The rise in cybercrime and the complexity of multilingual and code-mixed complaints present significant challenges for law enforcement and cybersecurity agencies. These organizations need automated, scalable methods to identify crime types, enabling efficient processing and prioritization of large complaint volumes. Manual triaging is inefficient, and traditional machine learning methods fail to capture the semantic and contextual nuances of textual cybercrime complaints. Moreover, the lack of publicly available datasets and privacy concerns hinder the research to present robust solutions. To address these challenges, we propose a framework for automated cybercrime complaint classification. The framework leverages Hinglish-adapted transformers, such as HingBERT and HingRoBERTa, to handle code-mixed inputs effectively. We employ the real-world dataset provided by Indian Cybercrime…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCybercrime and Law Enforcement Studies · Hate Speech and Cyberbullying Detection · Authorship Attribution and Profiling