It's Difficult to be Neutral -- Human and LLM-based Sentiment Annotation   of Patient Comments

Petter M{\ae}hlum; David Samuel; Rebecka Maria Norman; Elma Jelin,; {\O}yvind Andresen Bjertn{\ae}s; Lilja {\O}vrelid; Erik Velldal

arXiv:2404.18832·cs.CL·April 30, 2024·3 cites

It's Difficult to be Neutral -- Human and LLM-based Sentiment Annotation of Patient Comments

Petter M{\ae}hlum, David Samuel, Rebecka Maria Norman, Elma Jelin,, {\O}yvind Andresen Bjertn{\ae}s, Lilja {\O}vrelid, Erik Velldal

PDF

Open Access

TL;DR

This study compares human and large language model-based sentiment annotation of Norwegian patient comments, highlighting the strengths and limitations of LLMs as alternatives to human annotators in healthcare sentiment analysis.

Contribution

It provides an extensive evaluation of LLMs for sentiment annotation in healthcare data, demonstrating their potential and current limitations compared to human annotators.

Findings

01

LLMs perform well above baseline in binary sentiment detection

02

LLMs cannot yet match human annotators on full datasets

03

Zero-shot LLM performance is promising but not sufficient for full accuracy

Abstract

Sentiment analysis is an important tool for aggregating patient voices, in order to provide targeted improvements in healthcare services. A prerequisite for this is the availability of in-domain data annotated for sentiment. This article documents an effort to add sentiment annotations to free-text comments in patient surveys collected by the Norwegian Institute of Public Health (NIPH). However, annotation can be a time-consuming and resource-intensive process, particularly when it requires domain expertise. We therefore also evaluate a possible alternative to human annotation, using large language models (LLMs) as annotators. We perform an extensive evaluation of the approach for two openly available pretrained LLMs for Norwegian, experimenting with different configurations of prompts and in-context learning, comparing their performance to human annotators. We find that even for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Sentiment Analysis and Opinion Mining