TL;DR
This paper evaluates neural language models for dialog tutoring, revealing their strengths in constrained scenarios and highlighting significant challenges like reasoning errors and engagement issues in real educational settings.
Contribution
It provides a comprehensive analysis of current neural dialog tutors, identifying key limitations and outlining future research directions for effective educational applications.
Findings
Models perform well in constrained scenarios with few concepts.
Models and annotations show low fairness and engagement.
45% of conversations contain reasoning errors.
Abstract
Designing dialog tutors has been challenging as it involves modeling the diverse and complex pedagogical strategies employed by human tutors. Although there have been significant recent advances in neural conversational systems using large language models (LLMs) and growth in available dialog corpora, dialog tutoring has largely remained unaffected by these advances. In this paper, we rigorously analyze various generative language models on two dialog tutoring datasets for language learning using automatic and human evaluations to understand the new opportunities brought by these advances as well as the challenges we must overcome to build models that would be usable in real educational settings. We find that although current approaches can model tutoring in constrained learning scenarios when the number of concepts to be taught and possible teacher strategies are small, they perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
