Tensor Product Representation Probes Reveal Shared Structure Across Linear Directions
Andrew Lee, Fernanda Vi\'egas, Martin Wattenberg

TL;DR
This paper investigates how language models encode concepts and relations, revealing that tensor product representations (TPRs) uncover shared structured information beyond simple linear directions.
Contribution
It introduces TPR probes to analyze structured representations in a model trained on Othello, demonstrating the presence of shared, factorized structures in internal representations.
Findings
Linear decodable board-state representations exist in the model.
TPR probes reveal shared structure among linear directions.
Linear probes can be recovered from TPR probe parameters.
Abstract
While researchers are finding concepts represented as linear directions in language models, a bag of linear directions fails to capture relational structure. To better understand this dichotomy, we study a model with known linear representations, but trained in a highly structured domain -- the board game Othello. While the model's internal board-state representation is linearly decodable, we find additional structure in the form of tensor product representations (TPRs). We train TPR probes to recover shared structure amongst the linear probes, yielding a factorization into square-embeddings, color-embeddings, and a binding matrix that composes them to construct the model's board-state representation. We find geometric signatures within the weights of our TPR probe that align with the structure of the board, but perhaps more importantly, that the linear probes can be recovered directly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
