Enhancing the De-identification of Personally Identifiable Information in Educational Data
Zilyu Ji, Yuntian Shen, Jionghao Lin, Kenneth R. Koedinger

TL;DR
This study evaluates GPT-4o-mini for PII detection in educational data, demonstrating its superior accuracy, cost-efficiency, and robustness across diverse datasets and cultural backgrounds, advancing privacy-preserving educational data analysis.
Contribution
It introduces fine-tuning GPT-4o-mini for PII detection, showing it outperforms existing frameworks in accuracy and cost, with strong generalizability and bias mitigation.
Findings
GPT-4o-mini achieves 0.9589 recall on CRAPII.
Fine-tuned GPT-4o-mini triples precision and reduces costs.
Model maintains accuracy across diverse cultural and gender groups.
Abstract
Protecting Personally Identifiable Information (PII), such as names, is a critical requirement in learning technologies to safeguard student and teacher privacy and maintain trust. Accurate PII detection is an essential step toward anonymizing sensitive information while preserving the utility of educational data. Motivated by recent advancements in artificial intelligence, our study investigates the GPT-4o-mini model as a cost-effective and efficient solution for PII detection tasks. We explore both prompting and fine-tuning approaches and compare GPT-4o-mini's performance against established frameworks, including Microsoft Presidio and Azure AI Language. Our evaluation on two public datasets, CRAPII and TSCC, demonstrates that the fine-tuned GPT-4o-mini model achieves superior performance, with a recall of 0.9589 on CRAPII. Additionally, fine-tuned GPT-4o-mini significantly improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics · Educational Assessment and Improvement
