ScEdit: Script-based Assessment of Knowledge Editing

Xinye Li; Zunwen Zheng; Qian Zhang; Dekai Zhuang; Jiabao Kang; Liyan Xu; Qingbin Liu; Xi Chen; Zhiying Tu; Dianhui Chu; Dianbo Sui

arXiv:2505.23291·cs.CL·June 3, 2025

ScEdit: Script-based Assessment of Knowledge Editing

Xinye Li, Zunwen Zheng, Qian Zhang, Dekai Zhuang, Jiabao Kang, Liyan Xu, Qingbin Liu, Xi Chen, Zhiying Tu, Dianhui Chu, Dianbo Sui

PDF

Open Access 1 Repo

TL;DR

This paper introduces ScEdit, a comprehensive script-based benchmark for evaluating knowledge editing methods in language models, highlighting their challenges in real-world scenarios and across different evaluation metrics.

Contribution

The paper presents a novel benchmark, ScEdit, that extends traditional fact-based evaluation to action-based tasks and integrates multiple evaluation methods for comprehensive analysis.

Findings

01

All KE methods show performance drops on established metrics.

02

KE methods face challenges on text-level evaluation metrics.

03

The benchmark reveals the difficulty of real-world knowledge editing tasks.

Abstract

Knowledge Editing (KE) has gained increasing attention, yet current KE tasks remain relatively simple. Under current evaluation frameworks, many editing methods achieve exceptionally high scores, sometimes nearing perfection. However, few studies integrate KE into real-world application scenarios (e.g., recent interest in LLM-as-agent). To support our analysis, we introduce a novel script-based benchmark -- ScEdit (Script-based Knowledge Editing Benchmark) -- which encompasses both counterfactual and temporal edits. We integrate token-level and text-level evaluation methods, comprehensively analyzing existing KE techniques. The benchmark extends traditional fact-based ("What"-type question) evaluation to action-based ("How"-type question) evaluation. We observe that all KE methods exhibit a drop in performance on established metrics and face challenges on text-level metrics, indicating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

asdfo123/scedit
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Mobile Crowdsensing and Crowdsourcing