Forgetting-MarI: LLM Unlearning via Marginal Information Regularization
Shizhou Xu, Yuan Ni, Stefan Broecker, Thomas Strohmer

TL;DR
Forgetting-MarI is a novel LLM unlearning framework that selectively removes only the marginal information of specific data, ensuring privacy compliance while maintaining model performance.
Contribution
It introduces a provably effective unlearning method that bounds residual influence, outperforming existing techniques in reliability and performance preservation.
Findings
Outperforms current state-of-the-art unlearning methods
Provides provable guarantees of unlearning effectiveness
Maintains general model performance across benchmarks
Abstract
As AI models are trained on ever-expanding datasets, the ability to remove the influence of specific data from trained models has become essential for privacy protection and regulatory compliance. Unlearning addresses this challenge by selectively removing parametric knowledge from the trained models without retraining from scratch, which is critical for resource-intensive models such as Large Language Models (LLMs). Existing unlearning methods often degrade model performance by removing more information than necessary when attempting to ''forget'' specific data. We introduce Forgetting-MarI, an LLM unlearning framework that provably removes only the additional (marginal) information contributed by the data to be unlearned, while preserving the information supported by the data to be retained. By penalizing marginal information, our method yields an explicit upper bound on the unlearn…
Peer Reviews
Decision·ICLR 2026 Conference Desk Rejected Submission
- [S1] **Interesting conceptual direction.** The idea of designing an unlearning loss derived from information-theoretic principles is conceptually interesting. The focus on isolating and penalizing marginal information, if made rigorous and well-justified, could offer a new angle for formalizing unlearning objectives in LLMs.
- [W1] **Poor presentation and clarity.** The paper is difficult to follow overall and would benefit from a substantial reorganization and clearer exposition. Several key issues include: - The classification of unlearning methods into “marginal information unlearning” and “full unlearning” is not well-motivated. As the paper itself notes, most existing methods already mix forgetting and retention signals, making them marginal in nature. Therefore, the distinction does not meaningfully clarify
1. Relevance and Impact: The paper tackles a highly critical and timely challenge in AI—efficient and effective model unlearning for LLMs. Given the size and computational cost of LLMs, incremental unlearning solutions that preserve utility are essential for real-world deployment and regulatory compliance. 2. Theoretical Clarity: The concept of Marginal Information Regularization is theoretically intuitive and well-aligned with the goal of mitigating catastrophic forgetting. By directly targeti
1. Utility Trade-off on General Capabilities: The proposed method exhibits the lowest performance on the MMLU general capability benchmark compared to baselines (Table 5). This suggests a potential trade-off where the high unlearning efficacy might be achieved at the cost of a significant reduction in the model's general utility. A successful unlearning strategy should aim to maximize forgetting while minimally impacting core, general knowledge. 2. Limited Cross-Scenario Generalization and Scal
1. Conceptual novelty: The paper introduces a clear and elegant information-theoretic framing of LLM unlearning via marginal information, offering a well-motivated bridge between data privacy and utility preservation. 2. Theoretical rigor: Derivations are mathematically sound, and the theorems (especially Theorems 2.1 and 2.2) link mutual information bounds to empirical detectability in a principled way. 3. Strong empirical trends: Forgetting-MarI outperforms or matches prior unlearning methods
1. Experimental scale is limited. All experiments are on small to mid-scale models (GPT-2 Large, Llama-1B). It is unclear whether the proposed method scales to realistic 7B–70B LLMs, where unlearning challenges become severe. The tasks (Harry Potter and Careless People) are relatively narrow and do not demonstrate generalization beyond text completion. 2. Weak connection between theory and empirical validation. The theoretical guarantees rely on information-theoretic quantities (MI, JSD) that ar
1. The paper introduces a principled definition of marginal information using mutual information between retain-only and retain+unlearn distributions, giving a more grounded objective than heuristic gradient-ascent–based approaches. 2. Instead of over-forgetting (removing all related knowledge), it focuses on erasing only the incremental contribution of the unlearned data—better aligned with legal and practical expectations of data removal. 3. Theoretical results bound the residual mutual info
1. The proposed method requires full forward passes over both retain and unlearn sets to estimate token-level or pooled distributions and mutual information, which may be significantly more expensive than lightweight methods. There is no concrete runtime, GPU memory, or scalability analysis for larger models. 2. Although elegant theoretically, estimating JSD across token distributions of two datasets at every update step could be costly and unstable in heterogeneous or long-sequence settings. N
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
