Forget Me Not: Fighting Local Overfitting with Knowledge Fusion and Distillation

Uri Stern; Eli Corn; Daphna Weinshall

arXiv:2507.08686·cs.LG·July 14, 2025

Forget Me Not: Fighting Local Overfitting with Knowledge Fusion and Distillation

Uri Stern, Eli Corn, Daphna Weinshall

PDF

TL;DR

This paper investigates local overfitting in deep neural networks, introduces a score to measure it, and proposes a knowledge fusion and distillation method to recover forgotten knowledge, improving performance especially with noisy labels.

Contribution

It introduces a novel score for local overfitting, links it to double descent, and presents a two-stage knowledge fusion and distillation approach to mitigate overfitting effects.

Findings

01

Effective detection of local overfitting using the new score.

02

Knowledge fusion and distillation improve performance with label noise.

03

Method reduces training and inference complexity.

Abstract

Overfitting in deep neural networks occurs less frequently than expected. This is a puzzling observation, as theory predicts that greater model capacity should eventually lead to overfitting -- yet this is rarely seen in practice. But what if overfitting does occur, not globally, but in specific sub-regions of the data space? In this work, we introduce a novel score that measures the forgetting rate of deep models on validation data, capturing what we term local overfitting: a performance degradation confined to certain regions of the input space. We demonstrate that local overfitting can arise even without conventional overfitting, and is closely linked to the double descent phenomenon. Building on these insights, we introduce a two-stage approach that leverages the training history of a single model to recover and retain forgotten knowledge: first, by aggregating checkpoints into an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.