Evaluating Large Language Models for automatic analysis of teacher   simulations

David de-Fitero-Dominguez; Mariano Albaladejo-Gonz\'alez; Antonio; Garcia-Cabot; Eva Garcia-Lopez; Antonio Moreno-Cediel; Erin Barno; Justin; Reich

arXiv:2407.20360·cs.AI·July 31, 2024

Evaluating Large Language Models for automatic analysis of teacher simulations

David de-Fitero-Dominguez, Mariano Albaladejo-Gonz\'alez, Antonio, Garcia-Cabot, Eva Garcia-Lopez, Antonio Moreno-Cediel, Erin Barno, Justin, Reich

PDF

Open Access

TL;DR

This paper evaluates the effectiveness of Large Language Models, specifically DeBERTaV3 and Llama 3, in automatically analyzing responses in digital simulations for teacher training, highlighting their varying performance in identifying user behaviors.

Contribution

The study compares LLMs for analyzing teacher simulation responses, revealing Llama 3's superior stability in detecting new characteristics over DeBERTaV3.

Findings

01

Llama 3 outperforms DeBERTaV3 in identifying new response characteristics.

02

Performance of LLMs varies significantly depending on the characteristic to identify.

03

Llama 3 shows more stable performance in dynamic educational scenarios.

Abstract

Digital Simulations (DS) provide safe environments where users interact with an agent through conversational prompts, providing engaging learning experiences that can be used to train teacher candidates in realistic classroom scenarios. These simulations usually include open-ended questions, allowing teacher candidates to express their thoughts but complicating an automatic response analysis. To address this issue, we have evaluated Large Language Models (LLMs) to identify characteristics (user behaviors) in the responses of DS for teacher education. We evaluated the performance of DeBERTaV3 and Llama 3, combined with zero-shot, few-shot, and fine-tuning. Our experiments discovered a significant variation in the LLMs' performance depending on the characteristic to identify. Additionally, we noted that DeBERTaV3 significantly reduced its performance when it had to identify new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Online Learning and Analytics

MethodsLLaMA