Superficial Consciousness Hypothesis for Autoregressive Transformers

Yosuke Miyanishi; Keita Mitani

arXiv:2412.07278·cs.AI·December 11, 2024

Superficial Consciousness Hypothesis for Autoregressive Transformers

Yosuke Miyanishi, Keita Mitani

PDF

Open Access 1 Repo

TL;DR

This paper proposes the Superficial Consciousness Hypothesis for superintelligent AI, suggesting that such AI could exhibit a form of consciousness based on information integration, which can be empirically tested using language models like GPT-2.

Contribution

It introduces a novel hypothesis linking consciousness theory to AI alignment and demonstrates a preliminary validation using GPT-2 to simulate superintelligent behavior.

Findings

01

GPT-2 can be trained to follow both human and mesa-objectives simultaneously.

02

A practical consciousness metric correlates with perplexity in language models.

03

Preliminary results support the feasibility of the Superficial Consciousness Hypothesis.

Abstract

The alignment between human objectives and machine learning models built on these objectives is a crucial yet challenging problem for achieving Trustworthy AI, particularly when preparing for superintelligence (SI). First, given that SI does not exist today, empirical analysis for direct evidence is difficult. Second, SI is assumed to be more intelligent than humans, capable of deceiving us into underestimating its intelligence, making output-based analysis unreliable. Lastly, what kind of unexpected property SI might have is still unclear. To address these challenges, we propose the Superficial Consciousness Hypothesis under Information Integration Theory (IIT), suggesting that SI could exhibit a complex information-theoretic state like a conscious agent while unconscious. To validate this, we use a hypothetical scenario where SI can update its parameters "at will" to achieve its own…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hirethehero/phimesasi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · EEG and Brain-Computer Interfaces · Advanced Memory and Neural Computing

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Layer Normalization · Adam · Dropout · Attention Dropout · Softmax · Dense Connections · Cosine Annealing · Byte Pair Encoding