Automated scoring of the Ambiguous Intentions Hostility Questionnaire using fine-tuned large language models
Y. Lyu, D. Combs, D. Neumann, Y. C. Leong

TL;DR
This study demonstrates that fine-tuned large language models can accurately automate the scoring of open-ended responses in the Ambiguous Intentions Hostility Questionnaire, reducing the need for time-consuming human ratings.
Contribution
We developed and validated a fine-tuned large language model approach for automated scoring of AIHQ responses, showing high alignment with human ratings and generalization across datasets.
Findings
Model ratings aligned with human ratings for hostility and aggression.
Fine-tuned models outperformed baseline models in accuracy.
Models generalized well to an independent nonclinical dataset.
Abstract
Hostile attribution bias is the tendency to interpret social interactions as intentionally hostile. The Ambiguous Intentions Hostility Questionnaire (AIHQ) is commonly used to measure hostile attribution bias, and includes open-ended questions where participants describe the perceived intentions behind a negative social situation and how they would respond. While these questions provide insights into the contents of hostile attributions, they require time-intensive scoring by human raters. In this study, we assessed whether large language models can automate the scoring of AIHQ open-ended responses. We used a previously collected dataset in which individuals with traumatic brain injury (TBI) and healthy controls (HC) completed the AIHQ and had their open-ended responses rated by trained human raters. We used half of these responses to fine-tune the two models on human-generated ratings,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
