Large Language Model Agent Personality and Response Appropriateness: Evaluation by Human Linguistic Experts, LLM-as-Judge, and Natural Language Processing Model
Eswari Jayakumar, Niladri Sekhar Dash, Debasmita Mukherjee

TL;DR
This paper introduces an interdisciplinary evaluation framework for assessing LLM-based agent personalities using human experts, NLP models, and other LLMs, highlighting the limitations of deep learning-only approaches.
Contribution
It presents a novel, comprehensive assessment method combining linguistic analysis, human judgment, and NLP models to evaluate prompted agent personalities.
Findings
NLP models have limitations in accurately assessing agent personality.
Human experts provide more reliable evaluations of agent responses.
Interdisciplinary approaches improve the assessment of LLM-based agent personalities.
Abstract
While Large Language Model (LLM)-based agents can be used to create highly engaging interactive applications through prompting personality traits and contextual data, effectively assessing their personalities has proven challenging. This novel interdisciplinary approach addresses this gap by combining agent development and linguistic analysis to assess the prompted personality of LLM-based agents in a poetry explanation task. We developed a novel, flexible question bank, informed by linguistic assessment criteria and human cognitive learning levels, offering a more comprehensive evaluation than current methods. By evaluating agent responses with natural language processing models, other LLMs, and human experts, our findings illustrate the limitations of purely deep learning solutions and emphasize the critical role of interdisciplinary design in agent development.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
