The Impact of Example Selection in Few-Shot Prompting on Automated Essay Scoring Using GPT Models
Lui Yoshida

TL;DR
This paper examines how the choice and order of examples in few-shot prompting affect GPT-based automated essay scoring, revealing biases and model performance variations.
Contribution
It provides a detailed analysis of example selection effects on GPT models' essay scoring accuracy and bias, highlighting differences between GPT-3.5 and GPT-4.
Findings
Example selection significantly impacts GPT-3.5 scores
GPT-3.5 shows stronger biases than GPT-4
Careful example selection can improve GPT-3.5 performance
Abstract
This study investigates the impact of example selection on the performance of au-tomated essay scoring (AES) using few-shot prompting with GPT models. We evaluate the effects of the choice and order of examples in few-shot prompting on several versions of GPT-3.5 and GPT-4 models. Our experiments involve 119 prompts with different examples, and we calculate the quadratic weighted kappa (QWK) to measure the agreement between GPT and human rater scores. Regres-sion analysis is used to quantitatively assess biases introduced by example selec-tion. The results show that the impact of example selection on QWK varies across models, with GPT-3.5 being more influenced by examples than GPT-4. We also find evidence of majority label bias, which is a tendency to favor the majority la-bel among the examples, and recency bias, which is a tendency to favor the label of the most recent example, in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Linear Layer · Discriminative Fine-Tuning · Cosine Annealing · Multi-Head Attention · Transformer
