Using large language models to estimate features of multi-word   expressions: Concreteness, valence, arousal

Gonzalo Mart\'inez; Juan Diego Molero; Sandra Gonz\'alez; Javier; Conde; Marc Brysbaert; Pedro Reviriego

arXiv:2408.16012·cs.CL·August 30, 2024·2 cites

Using large language models to estimate features of multi-word expressions: Concreteness, valence, arousal

Gonzalo Mart\'inez, Juan Diego Molero, Sandra Gonz\'alez, Javier, Conde, Marc Brysbaert, Pedro Reviriego

PDF

Open Access

TL;DR

This paper demonstrates that large language models, specifically ChatGPT-4o, can accurately estimate psycholinguistic features like concreteness, valence, and arousal for multi-word expressions, providing valuable data for linguistic research.

Contribution

The study introduces a systematic evaluation of ChatGPT-4o's ability to predict psycholinguistic features of multi-word expressions, outperforming previous AI models and offering large-scale datasets.

Findings

01

ChatGPT-4o shows strong correlation with human ratings for concreteness (r = .8).

02

AI models match or outperform previous methods in predicting valence and arousal.

03

Large datasets of AI-generated psycholinguistic norms are provided for research use.

Abstract

This study investigates the potential of large language models (LLMs) to provide accurate estimates of concreteness, valence and arousal for multi-word expressions. Unlike previous artificial intelligence (AI) methods, LLMs can capture the nuanced meanings of multi-word expressions. We systematically evaluated ChatGPT-4o's ability to predict concreteness, valence and arousal. In Study 1, ChatGPT-4o showed strong correlations with human concreteness ratings (r = .8) for multi-word expressions. In Study 2, these findings were repeated for valence and arousal ratings of individual words, matching or outperforming previous AI models. Study 3 extended the prevalence and arousal analysis to multi-word expressions and showed promising results despite the lack of large-scale human benchmarks. These findings highlight the potential of LLMs for generating valuable psycholinguistic data related to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques