PIIvot: A Lightweight NLP Anonymization Framework for Question-Anchored Tutoring Dialogues
Matthew Zent, Digory Smith, Simon Woodhead

TL;DR
PIIvot is a lightweight NLP framework designed for effective PII anonymization in educational dialogues, leveraging data context to improve detection, supported by a new large-scale tutoring dataset.
Contribution
Introduces PIIvot, a novel lightweight framework for PII anonymization that uses data context, and provides QATD-2k, the largest open-source tutoring dialogue dataset.
Findings
PIIvot improves PII detection accuracy with lower computational complexity.
QATD-2k enables better research and development in educational dialogue anonymization.
Framework demonstrates effectiveness on real-world tutoring data.
Abstract
Personally identifiable information (PII) anonymization is a high-stakes task that poses a barrier to many open-science data sharing initiatives. While PII identification has made large strides in recent years, in practice, error thresholds and the recall/precision trade-off still limit the uptake of these anonymization pipelines. We present PIIvot, a lighter-weight framework for PII anonymization that leverages knowledge of the data context to simplify the PII detection problem. To demonstrate its effectiveness, we also contribute QATD-2k, the largest open-source real-world tutoring dataset of its kind, to support the demand for quality educational dialogue data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Advanced Graph Neural Networks · Topic Modeling
