Superficial Consciousness Hypothesis for Autoregressive Transformers
Yosuke Miyanishi, Keita Mitani

TL;DR
This paper proposes the Superficial Consciousness Hypothesis for superintelligent AI, suggesting that such AI could exhibit a form of consciousness based on information integration, which can be empirically tested using language models like GPT-2.
Contribution
It introduces a novel hypothesis linking consciousness theory to AI alignment and demonstrates a preliminary validation using GPT-2 to simulate superintelligent behavior.
Findings
GPT-2 can be trained to follow both human and mesa-objectives simultaneously.
A practical consciousness metric correlates with perplexity in language models.
Preliminary results support the feasibility of the Superficial Consciousness Hypothesis.
Abstract
The alignment between human objectives and machine learning models built on these objectives is a crucial yet challenging problem for achieving Trustworthy AI, particularly when preparing for superintelligence (SI). First, given that SI does not exist today, empirical analysis for direct evidence is difficult. Second, SI is assumed to be more intelligent than humans, capable of deceiving us into underestimating its intelligence, making output-based analysis unreliable. Lastly, what kind of unexpected property SI might have is still unclear. To address these challenges, we propose the Superficial Consciousness Hypothesis under Information Integration Theory (IIT), suggesting that SI could exhibit a complex information-theoretic state like a conscious agent while unconscious. To validate this, we use a hypothetical scenario where SI can update its parameters "at will" to achieve its own…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · EEG and Brain-Computer Interfaces · Advanced Memory and Neural Computing
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Layer Normalization · Adam · Dropout · Attention Dropout · Softmax · Dense Connections · Cosine Annealing · Byte Pair Encoding
