AnonLFI 2.0: Extensible Architecture for PII Pseudonymization in CSIRTs with OCR and Technical Recognizers

Cristhian Kapelinski; Douglas Lautert; Beatriz Machado; Diego Kreutz

arXiv:2511.15744·cs.CR·November 21, 2025

AnonLFI 2.0: Extensible Architecture for PII Pseudonymization in CSIRTs with OCR and Technical Recognizers

Cristhian Kapelinski, Douglas Lautert, Beatriz Machado, Diego Kreutz

PDF

Open Access

TL;DR

AnonLFI 2.0 is a modular framework that enhances PII pseudonymization in cybersecurity reports by integrating OCR and recognizers, ensuring data privacy while maintaining data structure integrity.

Contribution

It introduces an extensible architecture combining pseudonymization with OCR and recognizers, improving data privacy and utility in cybersecurity datasets.

Findings

01

Achieved perfect precision and F1 scores in case studies

02

Effectively preserves data structures like XML and JSON

03

Demonstrated high effectiveness in real-world cybersecurity data

Abstract

This work presents AnonLFI 2.0, a modular pseudonymization framework for CSIRTs that uses HMAC SHA256 to generate strong and reversible pseudonyms, preserves XML and JSON structures, and integrates OCR and technical recognizers for PII and security artifacts. In two case studies involving OCR applied to PDF documents and an OpenVAS XML report, the system achieved perfect precision and F1 scores of 76.5 and 92.13, demonstrating its effectiveness for securely preparing complex cybersecurity datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics · Web Application Security Vulnerabilities · Cryptography and Data Security