TAXI: Evaluating Categorical Knowledge Editing for Language Models
Derek Powell, Walter Gerych, Thomas Hartvigsen

TL;DR
This paper introduces TAXI, a benchmark dataset designed to evaluate the consistency of knowledge editing in language models, revealing that current methods have limited but non-random success in maintaining factual coherence.
Contribution
The paper presents TAXI, a new dataset for assessing categorical knowledge editing consistency, and evaluates existing editors, highlighting their limitations compared to human performance.
Findings
Editors achieve marginal consistency in edits
Consistency underperforms human baselines
Atypical subjects are easier to edit consistently
Abstract
Humans rarely learn one fact in isolation. Instead, learning a new fact induces knowledge of other facts about the world. For example, in learning a korat is a type of cat, you also infer it is a mammal and has claws, ensuring your model of the world is consistent. Knowledge editing aims to inject new facts into language models to improve their factuality, but current benchmarks fail to evaluate consistency, which is critical to ensure efficient, accurate, and generalizable edits. We manually create TAXI, a new benchmark dataset specifically created to evaluate consistency in categorical knowledge edits. TAXI contains 11,120 multiple-choice queries for 976 edits spanning 41 categories (e.g., Dogs), 164 subjects (e.g., Labrador), and 183 properties (e.g., is a mammal). We then use TAXI to evaluate popular editors' categorical consistency, measuring how often editing a subject's category…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
