I Think, Therefore I am: Benchmarking Awareness of Large Language Models   Using AwareBench

Yuan Li; Yue Huang; Yuli Lin; Siyuan Wu; Yao Wan; Lichao Sun

arXiv:2401.17882·cs.CL·February 19, 2024·1 cites

I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench

Yuan Li, Yue Huang, Yuli Lin, Siyuan Wu, Yao Wan, Lichao Sun

PDF

Open Access 2 Repos

TL;DR

This paper introduces AwareBench, a comprehensive benchmark to evaluate awareness in large language models across five dimensions, revealing current models' strengths and weaknesses in understanding themselves and social intelligence.

Contribution

The paper presents a novel taxonomy and dataset for assessing awareness in LLMs, linking awareness to AI safety and ethical development.

Findings

01

Most LLMs struggle with recognizing their capabilities and missions.

02

LLMs show decent social intelligence.

03

The benchmark provides a new way to evaluate AI awareness.

Abstract

Do large language models (LLMs) exhibit any forms of awareness similar to humans? In this paper, we introduce AwareBench, a benchmark designed to evaluate awareness in LLMs. Drawing from theories in psychology and philosophy, we define awareness in LLMs as the ability to understand themselves as AI models and to exhibit social intelligence. Subsequently, we categorize awareness in LLMs into five dimensions, including capability, mission, emotion, culture, and perspective. Based on this taxonomy, we create a dataset called AwareEval, which contains binary, multiple-choice, and open-ended questions to assess LLMs' understandings of specific awareness dimensions. Our experiments, conducted on 13 LLMs, reveal that the majority of them struggle to fully recognize their capabilities and missions while demonstrating decent social intelligence. We conclude by connecting awareness of LLMs with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques