Mr. Snuffleupagus at SemEval-2025 Task 4: Unlearning Factual Knowledge from LLMs Using Adaptive RMU
Arjun Dosajh, Mihika Sanghi

TL;DR
This paper presents an adaptive unlearning method for large language models to remove sensitive information, improving privacy and security while analyzing layer-specific effects and achieving competitive leaderboard rankings.
Contribution
It introduces the Adaptive RMU technique for unlearning PII from LLMs and evaluates its effectiveness across different model sizes and layers.
Findings
Effective unlearning of sensitive data from LLMs.
Layer-specific analysis reveals optimal regions for unlearning.
Achieved 4th place on leaderboard for 1B and 7B models.
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation. However, their tendency to memorize training data raises concerns regarding privacy, copyright compliance, and security, particularly in cases involving Personally Identifiable Information (PII). Effective machine unlearning techniques are essential to mitigate these risks, yet existing methods remain underdeveloped for LLMs due to their open-ended output space. In this work, we apply the Adaptive Representation Misdirection Unlearning (RMU) technique to unlearn sensitive information from LLMs. Through extensive experiments, we analyze the effects of unlearning across different decoder layers to determine the most effective regions for sensitive information removal. Our technique ranked 4th on the official leaderboard of both 1B parameter and 7B parameter models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)
