Scoring with Large Language Models: A Study on Measuring Empathy of   Responses in Dialogues

Henry J. Xie; Jinghan Zhang; Xinhao Zhang; Kunpeng Liu

arXiv:2412.20264·cs.CL·December 31, 2024

Scoring with Large Language Models: A Study on Measuring Empathy of Responses in Dialogues

Henry J. Xie, Jinghan Zhang, Xinhao Zhang, Kunpeng Liu

PDF

Open Access 1 Repo

TL;DR

This study investigates how large language models score empathy in dialogues, developing a framework that uses explicit features and classifiers to understand and approximate LLM empathy scoring performance.

Contribution

It introduces a novel framework for analyzing LLM empathy scoring using explicit features and classifiers, enhancing interpretability and understanding of LLM evaluation methods.

Findings

01

Embedding-based features achieve performance close to generic LLMs.

02

Combining MITI Code and explicit subfactors improves scoring accuracy.

03

Feature selection identifies key features for empathy scoring.

Abstract

In recent years, Large Language Models (LLMs) have become increasingly more powerful in their ability to complete complex tasks. One such task in which LLMs are often employed is scoring, i.e., assigning a numerical value from a certain scale to a subject. In this paper, we strive to understand how LLMs score, specifically in the context of empathy scoring. We develop a novel and comprehensive framework for investigating how effective LLMs are at measuring and scoring empathy of responses in dialogues, and what methods can be employed to deepen our understanding of LLM scoring. Our strategy is to approximate the performance of state-of-the-art and fine-tuned LLMs with explicit and explainable features. We train classifiers using various features of dialogues including embeddings, the Motivational Interviewing Treatment Integrity (MITI) Code, a set of explicit subfactors of empathy as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

henryjxie/Scoring-with-Large-Language-Models
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling

MethodsSparse Evolutionary Training · Feature Selection