Constructing Efficient Fact-Storing MLPs for Transformers
Owen Dugan, Roberto Garcia, Ronny Junkins, Jerry Liu, Dylan Zinsley, Sabri Eyuboglu, Atri Rudra, Chris R\'e

TL;DR
This paper introduces an improved framework for constructing fact-storing MLPs in Transformers, achieving near-optimal efficiency, broad applicability, and practical mechanisms for factual recall and editing in language models.
Contribution
It presents a new MLP construction method that is more general, efficient, and practical for storing and manipulating facts in Transformer-based models.
Findings
Achieves asymptotic optimal parameter efficiency for fact storage.
Discovers a metric characterizing facts-per-parameter scaling.
Demonstrates modular fact editing in Transformers.
Abstract
The success of large language models (LLMs) can be attributed in part to their ability to efficiently store factual knowledge as key-value mappings within their MLP parameters. Recent work has proposed explicit weight constructions to build such fact-storing MLPs, providing an improved understanding of LLM fact storage mechanisms. In this paper, we introduce an MLP construction framework that improves over previous constructions in three areas: it 1) works for all but a measure-zero set of feasible input-output pairs, 2) achieves asymptotically optimal parameter efficiency matching information-theoretic bounds for some embeddings, and 3) maintains usability within Transformers for factual recall. Through our improvements, we 1) discover a metric on value embeddings that characterizes facts-per-parameter scaling for both constructed and gradient-descent-trained MLPs, 2) identify a simple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining
