An Annotation Scheme and Classifier for Personal Facts in Dialogue
Konstantin Zaitsev

TL;DR
This paper introduces an enhanced annotation scheme and a transformer-based classifier for personal facts in dialogue, improving accuracy over few-shot LLM baselines and enabling structured fact management.
Contribution
It extends existing schemes with new categories and attributes, and provides a high-performing, resource-efficient classifier trained on a large annotated dataset.
Findings
Classifier achieves 81.6% macro F1, outperforming GPT-5.4-mini by nearly 9 points.
New categories and attributes enable better structured storage and filtering of personal facts.
Error analysis highlights ongoing challenges in semantic boundary and pragmatic reasoning.
Abstract
The advancement of Large Language Models (LLMs) has enabled their application in personalized dialogue systems. We present an extended annotation scheme for personal fact classification that addresses limitations in existing approaches, particularly PeaCoK. Our scheme introduces new categories (Demographics, Possessions) and attributes (Duration, Validity, Followup) that enable structured storage, quality filtering, and identification of facts suitable for dialogue continuation. We manually annotated 2,779 facts from Multi-Session Chat and trained a multi-head classifier based on transformer encoders. Combined with the Gemma-300M encoder, the classifier achieves \% macro F1, outperforming all few-shot LLM baselines (best: GPT-5.4-mini, 72.92\%) by nearly 9 percentage points while requiring substantially fewer computational resources. Error analysis reveals persistent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
