Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning
Minseok Choi, ChaeHun Park, Dohyun Lee, Jaegul Choo

TL;DR
This paper identifies limitations in current unlearning methods for large language models, especially with multi-hop queries, and proposes MUNCH, an uncertainty-based approach that improves the complete removal of such knowledge.
Contribution
The paper introduces MUNCH, a novel uncertainty-based method that effectively unlearns multi-hop knowledge in LLMs and can be integrated with existing techniques.
Findings
Existing methods fail to fully erase multi-hop knowledge.
MUNCH significantly improves unlearning effectiveness.
MUNCH is compatible with current unlearning techniques.
Abstract
Large language models (LLMs) serve as giant information stores, often including personal or copyrighted data, and retraining them from scratch is not a viable option. This has led to the development of various fast, approximate unlearning techniques to selectively remove knowledge from LLMs. Prior research has largely focused on minimizing the probabilities of specific token sequences by reversing the language modeling objective. However, these methods still leave LLMs vulnerable to adversarial attacks that exploit indirect references. In this work, we examine the limitations of current unlearning techniques in effectively erasing a particular type of indirect prompt: multi-hop queries. Our findings reveal that existing methods fail to completely remove multi-hop knowledge when one of the intermediate hops is unlearned. To address this issue, we propose MUNCH, a simple uncertainty-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHigher Education Learning Practices · Online and Blended Learning · Education and Critical Thinking Development
