Detection of Illicit Content on Online Marketplaces using Large Language Models

Quoc Khoa Tran; Thanh Thi Nguyen; Campbell Wilson

arXiv:2603.04707·cs.CL·March 6, 2026

Detection of Illicit Content on Online Marketplaces using Large Language Models

Quoc Khoa Tran, Thanh Thi Nguyen, Campbell Wilson

PDF

Open Access

TL;DR

This paper evaluates the effectiveness of large language models like Llama 3.2 and Gemma 3 in detecting illicit content on online marketplaces, demonstrating their superiority in complex, multi-category classification tasks over traditional models.

Contribution

It introduces the application of LLMs with fine-tuning techniques for illicit content detection, showing significant improvements in multi-class classification over baseline models.

Findings

01

LLMs perform comparably to traditional models in binary classification.

02

Llama 3.2 outperforms baselines in multi-class, imbalanced classification.

03

Fine-tuning enhances LLMs' effectiveness in illicit content detection.

Abstract

Online marketplaces, while revolutionizing global commerce, have inadvertently facilitated the proliferation of illicit activities, including drug trafficking, counterfeit sales, and cybercrimes. Traditional content moderation methods such as manual reviews and rule-based automated systems struggle with scalability, dynamic obfuscation techniques, and multilingual content. Conventional machine learning models, though effective in simpler contexts, often falter when confronting the semantic complexities and linguistic nuances characteristic of illicit marketplace communications. This research investigates the efficacy of Large Language Models (LLMs), specifically Meta's Llama 3.2 and Google's Gemma 3, in detecting and classifying illicit online marketplace content using the multilingual DUTA10K dataset. Employing fine-tuning techniques such as Parameter-Efficient Fine-Tuning (PEFT) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Spam and Phishing Detection · Cybercrime and Law Enforcement Studies