Consiglieres in the Shadow: Understanding the Use of Uncensored Large Language Models in Cybercrimes
Zilong Lin, Zichuan Li, Xiaojing Liao, XiaoFeng Wang

TL;DR
This paper systematically identifies and analyzes uncensored large language models used in cybercrimes, revealing their scale, malicious capabilities, and widespread underground sharing, highlighting urgent security concerns.
Contribution
First systematic study of uncensored LLMs using a knowledge graph and deep learning to discover over 11,000 models involved in cybercrime activities.
Findings
Over 11,000 uncensored LLMs identified, some with over 19 million installs.
Many models capable of generating harmful content like hate speech and malicious code.
Criminals share techniques and scripts for building malicious LLMs in underground forums.
Abstract
The advancement of AI technologies, particularly Large Language Models (LLMs), has transformed computing while introducing new security and privacy risks. Prior research shows that cybercriminals are increasingly leveraging uncensored LLMs (ULLMs) as backends for malicious services. Understanding these ULLMs has been hindered by the challenge of identifying them among the vast number of open-source LLMs hosted on platforms like Hugging Face. In this paper, we present the first systematic study of ULLMs, overcoming this challenge by modeling relationships among open-source LLMs and between them and related data, such as fine-tuning, merging, compressing models, and using or generating datasets with harmful content. Representing these connections as a knowledge graph, we applied graph-based deep learning to discover over 11,000 ULLMs from a small set of labeled examples and uncensored…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCybercrime and Law Enforcement Studies · Hate Speech and Cyberbullying Detection · Authorship Attribution and Profiling
