Domain-Adapted Retrieval for In-Context Annotation of Pedagogical Dialogue Acts
Jinsook Lee, Kirk Vanacore, Zhuqian Zhou, Bakhtawar Ahtisham, Rene F. Kizilcec

TL;DR
This paper introduces a domain-adapted retrieval-augmented generation pipeline that improves pedagogical dialogue act annotation by fine-tuning a lightweight embedding model and indexing dialogues at the utterance level, outperforming baselines.
Contribution
It demonstrates that domain-adapted retrieval significantly enhances dialogue act annotation accuracy without fine-tuning large language models.
Findings
Achieves Cohen's κ of 0.526-0.580 on TalkMoves and 0.659-0.743 on Eedi datasets.
Utterance-level indexing is the main factor driving performance gains.
Retrieval corrects systematic label biases and improves rare label detection.
Abstract
Automated annotation of pedagogical dialogue is a high-stakes task where LLMs often fail without sufficient domain grounding. We present a domain-adapted RAG pipeline for tutoring move annotation. Rather than fine-tuning the generative model, we adapt retrieval by fine-tuning a lightweight embedding model on tutoring corpora and indexing dialogues at the utterance level to retrieve labeled few-shot demonstrations. Evaluated across two real tutoring dialogue datasets (TalkMoves and Eedi) and three LLM backbones (GPT-5.2, Claude Sonnet 4.6, Qwen3-32b), our best configuration achieves Cohen's of 0.526-0.580 on TalkMoves and 0.659-0.743 on Eedi, substantially outperforming no-retrieval baselines (- and -). An ablation study reveals that utterance-level indexing, rather than embedding quality alone, is the primary driver of these gains, with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
