HLB: Benchmarking LLMs' Humanlikeness in Language Use

Xufeng Duan; Bei Xiao; Xuemei Tang; Zhenguang G. Cai

arXiv:2409.15890·cs.CL·September 25, 2024

HLB: Benchmarking LLMs' Humanlikeness in Language Use

Xufeng Duan, Bei Xiao, Xuemei Tang, Zhenguang G. Cai

PDF

Open Access

TL;DR

This paper introduces HLB, a comprehensive benchmark using psycholinguistic experiments to evaluate how closely 20 large language models mimic human language use across various linguistic levels, highlighting nuanced differences and the disconnect with traditional performance metrics.

Contribution

The paper presents the first systematic framework for assessing LLMs' humanlikeness in language use through psycholinguistic experiments and a novel response distribution comparison method.

Findings

01

LLMs show fine-grained differences from human responses across linguistic levels

02

Improvements in traditional metrics do not necessarily increase humanlikeness

03

Some LLMs' responses become less humanlike despite better performance metrics

Abstract

As synthetic data becomes increasingly prevalent in training language models, particularly through generated dialogue, concerns have emerged that these models may deviate from authentic human language patterns, potentially losing the richness and creativity inherent in human communication. This highlights the critical need to assess the humanlikeness of language models in real-world language use. In this paper, we present a comprehensive humanlikeness benchmark (HLB) evaluating 20 large language models (LLMs) using 10 psycholinguistic experiments designed to probe core linguistic aspects, including sound, word, syntax, semantics, and discourse (see https://huggingface.co/spaces/XufengDuan/HumanLikeness). To anchor these comparisons, we collected responses from over 2,000 human participants and compared them to outputs from the LLMs in these experiments. For rigorous evaluation, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTaxation and Legal Issues · Legal Language and Interpretation · Library Science and Information Systems