Assessing the nature of large language models: A caution against anthropocentrism
Ann Speed

TL;DR
This study evaluates large language models like GPT-3.5 using cognitive and personality tests, revealing they lack sentience but exhibit human-like variability and signs of poor mental health, challenging anthropocentric assumptions.
Contribution
The paper introduces a novel battery of tests to assess LLMs' capabilities, stability, and human comparability, highlighting their limitations and psychological-like traits.
Findings
LLMs unlikely to have developed sentience
GPT-3.5 shows high variability in responses
LLMs exhibit poor mental health indicators
Abstract
Generative AI models garnered a large amount of public attention and speculation with the release of OpenAIs chatbot, ChatGPT. At least two opinion camps exist: one excited about possibilities these models offer for fundamental changes to human tasks, and another highly concerned about power these models seem to have. To address these concerns, we assessed several LLMs, primarily GPT 3.5, using standard, normed, and validated cognitive and personality measures. For this seedling project, we developed a battery of tests that allowed us to estimate the boundaries of some of these models capabilities, how stable those capabilities are over a short period of time, and how they compare to humans. Our results indicate that LLMs are unlikely to have developed sentience, although its ability to respond to personality inventories is interesting. GPT3.5 did display large variability in both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Digital Mental Health Interventions · Psychosomatic Disorders and Their Treatments
MethodsMulti-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Layer · Linear Warmup With Cosine Annealing · Dense Connections · Layer Normalization · Dropout · Attention Dropout · Discriminative Fine-Tuning
