Do Large GPT Models Discover Moral Dimensions in Language Representations? A Topological Study Of Sentence Embeddings
Stephen Fitz

TL;DR
This study investigates whether GPT-3.5's sentence embeddings encode moral dimensions by analyzing their topological structure with respect to a fairness metric, revealing a separation into fair and unfair submanifolds.
Contribution
The paper introduces a novel topological visualization method to analyze moral dimensions in GPT embeddings, demonstrating that these models develop an understanding of fairness during training.
Findings
GPT-3.5 embeddings form two submanifolds for fair and unfair judgments
The topological analysis reveals an emergent moral dimension in language representations
The fairness metric effectively captures moral distinctions in sentence embeddings
Abstract
As Large Language Models are deployed within Artificial Intelligence systems, that are increasingly integrated with human society, it becomes more important than ever to study their internal structures. Higher level abilities of LLMs such as GPT-3.5 emerge in large part due to informative language representations they induce from raw text data during pre-training on trillions of words. These embeddings exist in vector spaces of several thousand dimensions, and their processing involves mapping between multiple vector spaces, with total number of parameters on the order of trillions. Furthermore, these language representations are induced by gradient optimization, resulting in a black box system that is hard to interpret. In this paper, we take a look at the topological structure of neuronal activity in the "brain" of Chat-GPT's foundation language model, and analyze it with respect to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Clusterin in disease pathology
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Softmax · Dense Connections · Linear Layer · Attention Dropout · Residual Connection · Adam
