Addressing Labelled Data Scarcity: Taxonomy-Agnostic Annotation of PII Values in HTTP Traffic using LLMs

Thomas Cory; Axel K\"upper

arXiv:2605.06305·cs.AI·May 8, 2026

Addressing Labelled Data Scarcity: Taxonomy-Agnostic Annotation of PII Values in HTTP Traffic using LLMs

Thomas Cory, Axel K\"upper

PDF

TL;DR

This paper explores using Large Language Models to create flexible, taxonomy-agnostic annotations of PII in HTTP traffic, overcoming limitations of fixed-label systems and scarce data.

Contribution

It introduces a multi-stage LLM pipeline for taxonomy-agnostic PII annotation and a synthetic data generator for evaluation without sensitive data.

Findings

01

The pipeline accurately detects PII types across different taxonomies.

02

LLMs can extract PII values effectively in a taxonomy-agnostic manner.

03

Synthetic HTTP traffic with validated annotations supports evaluation without real user data.

Abstract

Automated privacy audits of web and mobile applications often analyse outbound HTTP traffic to detect Personally Identifiable Information (PII) leakage. However, existing learning-based detectors typically depend on scarce, manually labelled traffic and are tightly coupled to fixed label taxonomies, limiting transferability across domains and evolving definitions of PII. This paper investigates whether Large Language Models (LLMs) can support taxonomy-agnostic annotation of explicitly transmitted PII values in HTTP message bodies when the taxonomy is provided at runtime. We introduce a multi-stage LLM-based pipeline that combines deterministic pre-processing with label-level classification, targeted instance-level value annotation, and output validation. To enable controlled evaluation and exemplar-based prompting without relying on sensitive real-user captures, we further propose an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.