MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning
Jiali Cheng, Hadi Amiri

TL;DR
MU-Bench is the first comprehensive benchmark for evaluating machine unlearning across multiple tasks and modalities, enabling consistent comparison and advancing research in the field.
Contribution
It unifies evaluation protocols, covers diverse tasks including speech and video, and provides tools and leaderboards for scalable MU research.
Findings
RandLabel and SalUn are the most effective MU methods on MU-Bench.
BadT and SCRUB can achieve random performance on deletion sets.
Analysis of scalability, parameter-efficient fine-tuning, and dataset biases.
Abstract
Recent advancements in Machine Unlearning (MU) have introduced solutions to selectively remove certain training samples, such as those with outdated or sensitive information, from trained models. Despite these advancements, evaluation of MU methods have been inconsistent, employing different trained models and architectures, and sample removal strategies, which hampers accurate comparison. In addition, prior MU approaches have mainly focused on singular tasks or modalities, which is not comprehensive. To address these limitations, we develop MU-Bench, the first comprehensive benchmark for MU that (i) unifies the sets of deleted samples and trained models, and (ii) provides broad coverage of tasks and data modalities, including previously unexplored domains such as speech and video classification. Our evaluation show that RandLabel and SalUn are the most effective general MU approaches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
