Can LLMs Understand the Implication of Emphasized Sentences in Dialogue?
Guan-Ting Lin, Hung-yi Lee

TL;DR
This paper introduces Emphasized-Talk, a benchmark for evaluating LLMs' ability to understand emphasis in dialogue, revealing that current models perform reasonably but still need improvement in grasping implied meanings.
Contribution
The paper presents a new benchmark with emphasis-annotated dialogue samples and an automatic GPT-4 based evaluation pipeline for assessing LLMs' understanding of emphasis.
Findings
Commercial LLMs outperform open-source models
Current models show limited understanding of emphasis implications
GPT-4 based evaluation correlates well with human judgment
Abstract
Emphasis is a crucial component in human communication, which indicates the speaker's intention and implication beyond pure text in dialogue. While Large Language Models (LLMs) have revolutionized natural language processing, their ability to understand emphasis in dialogue remains unclear. This paper introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogue samples capturing the implications of emphasis. We evaluate various LLMs, both open-source and commercial, to measure their performance in understanding emphasis. Additionally, we propose an automatic evaluation pipeline using GPT-4, which achieves a high correlation with human rating. Our findings reveal that although commercial LLMs generally perform better, there is still significant room for improvement in comprehending emphasized sentences.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsInterpreting and Communication in Healthcare
MethodsResidual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer
