Detection and Analysis of Sensitive and Illegal Content on the Ethereum Blockchain Using Machine Learning Techniques

Xingyu Feng

arXiv:2512.17411·cs.CR·December 22, 2025

Detection and Analysis of Sensitive and Illegal Content on the Ethereum Blockchain Using Machine Learning Techniques

Xingyu Feng

PDF

Open Access

TL;DR

This paper presents a machine learning-based approach to detect sensitive and illegal content on the Ethereum blockchain, including text, images, and files, highlighting privacy and security concerns.

Contribution

It introduces a novel data identification and classification algorithm for blockchain content, combining sentiment analysis and image detection to identify illicit material.

Findings

01

Recovered 175 files, 296 images, 91,206 texts

02

Achieved 0.9 sentiment analysis accuracy

03

Detected 7 indecent images with 100% accuracy

Abstract

Blockchain technology, lauded for its transparent and immutable nature, introduces a novel trust model. However, its decentralized structure raises concerns about potential inclusion of malicious or illegal content. This study focuses on Ethereum, presenting a data identification and restoration algorithm. Successfully recovering 175 common files, 296 images, and 91,206 texts, we employed the FastText algorithm for sentiment analysis, achieving a 0.9 accuracy after parameter tuning. Classification revealed 70,189 neutral, 5,208 positive, and 15,810 negative texts, aiding in identifying sensitive or illicit information. Leveraging the NSFWJS library, we detected seven indecent images with 100% accuracy. Our findings expose the coexistence of benign and harmful content on the Ethereum blockchain, including personal data, explicit images, divisive language, and racial discrimination.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlockchain Technology Applications and Security · Misinformation and Its Impacts · Big Data and Digital Economy