Detection of Illicit Content on Online Marketplaces using Large Language Models
Quoc Khoa Tran, Thanh Thi Nguyen, Campbell Wilson

TL;DR
This paper evaluates the effectiveness of large language models like Llama 3.2 and Gemma 3 in detecting illicit content on online marketplaces, demonstrating their superiority in complex, multi-category classification tasks over traditional models.
Contribution
It introduces the application of LLMs with fine-tuning techniques for illicit content detection, showing significant improvements in multi-class classification over baseline models.
Findings
LLMs perform comparably to traditional models in binary classification.
Llama 3.2 outperforms baselines in multi-class, imbalanced classification.
Fine-tuning enhances LLMs' effectiveness in illicit content detection.
Abstract
Online marketplaces, while revolutionizing global commerce, have inadvertently facilitated the proliferation of illicit activities, including drug trafficking, counterfeit sales, and cybercrimes. Traditional content moderation methods such as manual reviews and rule-based automated systems struggle with scalability, dynamic obfuscation techniques, and multilingual content. Conventional machine learning models, though effective in simpler contexts, often falter when confronting the semantic complexities and linguistic nuances characteristic of illicit marketplace communications. This research investigates the efficacy of Large Language Models (LLMs), specifically Meta's Llama 3.2 and Google's Gemma 3, in detecting and classifying illicit online marketplace content using the multilingual DUTA10K dataset. Employing fine-tuning techniques such as Parameter-Efficient Fine-Tuning (PEFT) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Spam and Phishing Detection · Cybercrime and Law Enforcement Studies
