PIIvot: A Lightweight NLP Anonymization Framework for Question-Anchored Tutoring Dialogues

Matthew Zent; Digory Smith; Simon Woodhead

arXiv:2505.16931·cs.CL·May 23, 2025

PIIvot: A Lightweight NLP Anonymization Framework for Question-Anchored Tutoring Dialogues

Matthew Zent, Digory Smith, Simon Woodhead

PDF

Open Access 1 Video

TL;DR

PIIvot is a lightweight NLP framework designed for effective PII anonymization in educational dialogues, leveraging data context to improve detection, supported by a new large-scale tutoring dataset.

Contribution

Introduces PIIvot, a novel lightweight framework for PII anonymization that uses data context, and provides QATD-2k, the largest open-source tutoring dialogue dataset.

Findings

01

PIIvot improves PII detection accuracy with lower computational complexity.

02

QATD-2k enables better research and development in educational dialogue anonymization.

03

Framework demonstrates effectiveness on real-world tutoring data.

Abstract

Personally identifiable information (PII) anonymization is a high-stakes task that poses a barrier to many open-science data sharing initiatives. While PII identification has made large strides in recent years, in practice, error thresholds and the recall/precision trade-off still limit the uptake of these anonymization pipelines. We present PIIvot, a lighter-weight framework for PII anonymization that leverages knowledge of the data context to simplify the PII detection problem. To demonstrate its effectiveness, we also contribute QATD-2k, the largest open-source real-world tutoring dataset of its kind, to support the demand for quality educational dialogue data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

PIIvot: A Lightweight NLP Anonymization Framework for Question-Anchored Tutoring Dialogues· underline

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Advanced Graph Neural Networks · Topic Modeling