Comparing Few-Shot Prompting of GPT-4 LLMs with BERT Classifiers for   Open-Response Assessment in Tutor Equity Training

Sanjit Kakarla; Conrad Borchers; Danielle Thomas; Shambhavi Bhushan,; Kenneth R. Koedinger

arXiv:2501.06658·cs.HC·January 14, 2025

Comparing Few-Shot Prompting of GPT-4 LLMs with BERT Classifiers for Open-Response Assessment in Tutor Equity Training

Sanjit Kakarla, Conrad Borchers, Danielle Thomas, Shambhavi Bhushan,, Kenneth R. Koedinger

PDF

1 Repo

TL;DR

This study compares fine-tuned BERT classifiers with few-shot prompting GPT-4 models for assessing open responses in equity-focused tutor training, finding BERT more effective and resource-efficient in nuanced tasks.

Contribution

It demonstrates that fine-tuning BERT outperforms GPT-4 few-shot prompting in complex, nuanced assessment tasks related to equity training.

Findings

01

BERT outperforms GPT-4 in accuracy for open-response assessment.

02

Fine-tuning BERT is more resource-efficient than prompting GPT-4.

03

GPT-4 models struggle with nuanced, explanation-based responses.

Abstract

Assessing learners in ill-defined domains, such as scenario-based human tutoring training, is an area of limited research. Equity training requires a nuanced understanding of context, but do contemporary large language models (LLMs) have a knowledge base that can navigate these nuances? Legacy transformer models like BERT, in contrast, have less real-world knowledge but can be more easily fine-tuned than commercial LLMs. Here, we study whether fine-tuning BERT on human annotations outperforms state-of-the-art LLMs (GPT-4o and GPT-4-Turbo) with few-shot prompting and instruction. We evaluate performance on four prediction tasks involving generating and explaining open-ended responses in advocacy-focused training lessons in a higher education student population learning to become middle school tutors. Leveraging a dataset of 243 human-annotated open responses from tutor training lessons,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

conradborchers/bert-llm-open-response
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.