Detecting Data Exfiltration through I2P Anonymity Networks: A Two-Phase Machine Learning Approach

Siddique Abubakr Muntaka; Muntaka Mohammed; Mansuru Mikail Azindo; Ibrahim Tanko; Franco Osei-Wusu; Edward Danso Ansong; Benjamin Yankson; Oliver Kornyo; Foster Yeboah; Jones Yeboah; Richmond Adams; Pulcheria Serwaa

arXiv:2605.20546·cs.CR·May 21, 2026

Detecting Data Exfiltration through I2P Anonymity Networks: A Two-Phase Machine Learning Approach

Siddique Abubakr Muntaka, Muntaka Mohammed, Mansuru Mikail Azindo, Ibrahim Tanko, Franco Osei-Wusu, Edward Danso Ansong, Benjamin Yankson, Oliver Kornyo, Foster Yeboah, Jones Yeboah, Richmond Adams, Pulcheria Serwaa

PDF

TL;DR

This paper introduces a two-phase machine learning system that accurately detects I2P network traffic and identifies potential data exfiltration activities, enhancing cybersecurity defenses against anonymous malicious actors.

Contribution

It presents a novel two-stage ML approach combining Random Forest and XGBoost for effective I2P traffic detection and behavioral threat assessment, outperforming other models.

Findings

01

Phase 1 achieved 99.96% accuracy in distinguishing I2P from normal traffic.

02

Phase 2 classified exfiltration versus legitimate I2P activity with 91.11% accuracy.

03

Tree-based ensemble methods outperform neural networks and SVMs for this task.

Abstract

The Invisible Internet Project (I2P) provides strong anonymity through garlic routing and distributed network architecture, making it attractive for legitimate privacy needs. Nevertheless, the same properties can be exploited by malicious actors to steal sensitive information from corporate networks without detection. Current network security measures often fail to detect I2P traffic, and existing literature has focused primarily on protocol-level traffic identification without addressing behavioral threat assessment. This paper proposes a two-stage machine-learning model for I2P traffic analysis using the SafeSurf Darknet 2025 dataset comprising 184,548 network flows. Phase 1 achieved 99.96% accuracy in distinguishing I2P traffic from normal network traffic using a Random Forest classifier, with only 2 false positives among 32,318 normal flows. Phase 2 performed behavioral analysis on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.