TL;DR
CrispEdit is a scalable second-order LLM editing method that preserves capabilities by enforcing low-curvature constraints during model updates, improving success rates and reducing degradation.
Contribution
It introduces a novel second-order editing algorithm using low-curvature projections and Bregman divergence, unifying and enhancing existing approaches.
Findings
Achieves high edit success rates on standard benchmarks.
Maintains capability degradation below 1% on average.
Outperforms prior editing methods significantly.
Abstract
A central challenge in large language model (LLM) editing is capability preservation: methods that successfully change targeted behavior can quietly game the editing proxy and corrupt general capabilities, producing degenerate behaviors reminiscent of proxy/reward hacking. We present CrispEdit, a scalable and principled second-order editing algorithm that treats capability preservation as an explicit constraint, unifying and generalizing several existing editing approaches. CrispEdit formulates editing as constrained optimization and enforces the constraint by projecting edit updates onto the low-curvature subspace of the capability-loss landscape. At the crux of CrispEdit is expressing capability constraint via Bregman divergence, whose quadratic form yields the Gauss-Newton Hessian exactly and even when the base model is not trained to convergence. We make this second-order procedure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
