Behemoth: Benchmarking Unlearning in LLMs Using Fully Synthetic Data
Eugenia Iofinova, Dan Alistarh

TL;DR
Behemoth introduces a synthetic data framework to better understand and evaluate model editing in large language models, revealing insights into how data and update strategies affect editing effectiveness.
Contribution
The paper presents a novel synthetic data generation framework, Behemoth, enabling controlled analysis of model editing effects in large language models.
Findings
Restricting update rank can improve model editing effectiveness.
Synthetic data helps uncover interactions between training data and model updates.
Insights from simple tabular data experiments echo some real-world model editing results.
Abstract
As artificial neural networks, and specifically large language models, have improved rapidly in capabilities and quality, they have increasingly been deployed in real-world applications, from customer service to Google search, despite the fact that they frequently make factually incorrect or undesirable statements. This trend has inspired practical and academic interest in model editing, that is, in adjusting the weights of the model to modify its likely outputs for queries relating to a specific fact or set of facts. This may be done either to amend a fact or set of facts, for instance, to fix a frequent error in the training data, or to suppress a fact or set of facts entirely, for instance, in case of dangerous knowledge. Multiple methods have been proposed to do such edits. However, at the same time, it has been shown that such model editing can be brittle and incomplete. Moreover…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
