AnonLFI 2.0: Extensible Architecture for PII Pseudonymization in CSIRTs with OCR and Technical Recognizers
Cristhian Kapelinski, Douglas Lautert, Beatriz Machado, Diego Kreutz

TL;DR
AnonLFI 2.0 is a modular framework that enhances PII pseudonymization in cybersecurity reports by integrating OCR and recognizers, ensuring data privacy while maintaining data structure integrity.
Contribution
It introduces an extensible architecture combining pseudonymization with OCR and recognizers, improving data privacy and utility in cybersecurity datasets.
Findings
Achieved perfect precision and F1 scores in case studies
Effectively preserves data structures like XML and JSON
Demonstrated high effectiveness in real-world cybersecurity data
Abstract
This work presents AnonLFI 2.0, a modular pseudonymization framework for CSIRTs that uses HMAC SHA256 to generate strong and reversible pseudonyms, preserves XML and JSON structures, and integrates OCR and technical recognizers for PII and security artifacts. In two case studies involving OCR applied to PDF documents and an OpenVAS XML report, the system achieved perfect precision and F1 scores of 76.5 and 92.13, demonstrating its effectiveness for securely preparing complex cybersecurity datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics · Web Application Security Vulnerabilities · Cryptography and Data Security
