From Imitation to Introspection: Probing Self-Consciousness in Language   Models

Sirui Chen; Shu Yu; Shengjie Zhao; Chaochao Lu

arXiv:2410.18819·cs.CL·October 25, 2024

From Imitation to Introspection: Probing Self-Consciousness in Language Models

Sirui Chen, Shu Yu, Shengjie Zhao, Chaochao Lu

PDF

Open Access 1 Repo

TL;DR

This paper explores the emergence of self-consciousness in language models by defining core concepts, evaluating models, visualizing their internal representations, and demonstrating that fine-tuning can enhance their self-awareness.

Contribution

It introduces a practical definition of self-consciousness for language models and employs causal structural games to analyze and manipulate these models' internal representations.

Findings

01

Models show early signs of self-consciousness concepts

02

Internal representations can be visualized and partially manipulated

03

Fine-tuning improves models' understanding of self-consciousness concepts

Abstract

Self-consciousness, the introspection of one's existence and thoughts, represents a high-level cognitive process. As language models advance at an unprecedented pace, a critical question arises: Are these models becoming self-conscious? Drawing upon insights from psychological and neural science, this work presents a practical definition of self-consciousness for language models and refines ten core concepts. Our work pioneers an investigation into self-consciousness in language models by, for the first time, leveraging causal structural games to establish the functional definitions of the ten core concepts. Based on our definitions, we conduct a comprehensive four-stage experiment: quantification (evaluation of ten leading models), representation (visualization of self-consciousness within the models), manipulation (modification of the models' representation), and acquisition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

opencausalab/selfconsciousness
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling