Developing a Tutoring Dialog Dataset to Optimize LLMs for Educational   Use

Menna Fateen; Tsunenori Mine

arXiv:2410.19231·cs.CL·October 28, 2024

Developing a Tutoring Dialog Dataset to Optimize LLMs for Educational Use

Menna Fateen, Tsunenori Mine

PDF

Open Access

TL;DR

This paper presents a cost-effective approach to developing educational tutoring systems by creating a synthetic dialog dataset to fine-tune smaller LLMs, achieving comparable performance to larger models in real-world scenarios.

Contribution

The study introduces a synthetic tutoring dialog dataset and demonstrates that fine-tuning smaller LLMs can match larger models' performance at lower costs.

Findings

01

Fine-tuned smaller LLMs perform on par with larger models in tutoring tasks.

02

Synthetic datasets can effectively train LLMs for educational applications.

03

Cost reduction achieved without sacrificing model effectiveness.

Abstract

Recent advances in large language models (LLMs) have shown promise for scalable educational applications, but their use in dialog-based tutoring systems remains challenging due to the need for effective pedagogical strategies and the high costs associated with expert-curated datasets. Our study explores the use of smaller, more affordable LLMs for one-on-one tutoring in the context of solving reading comprehension problems. We developed a synthetic tutoring dialog dataset, evaluated by human teachers, and fine-tuned a smaller LLM using this dataset. Furthermore, we conducted an interactive experiment comparing the performance of the fine-tuned model with a larger model in real-world tutoring scenarios. Our results show that the fine-tuned model performs on par with the larger model but at a lower cost, demonstrating a viable, cost-effective approach for implementing LLM-based tutoring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Speech and dialogue systems · Topic Modeling