A Calibrated Memorization Index (MI) for Detecting Training Data Leakage in Generative MRI Models

Yash Deo; Yan Jia; Toni Lassila; Victoria J Hodge; Alejandro F Frang; Chenghao Qian; Siyuan Kang; Ibrahim Habli

arXiv:2602.13066·cs.CV·February 16, 2026

A Calibrated Memorization Index (MI) for Detecting Training Data Leakage in Generative MRI Models

Yash Deo, Yan Jia, Toni Lassila, Victoria J Hodge, Alejandro F Frang, Chenghao Qian, Siyuan Kang, Ibrahim Habli

PDF

Open Access

TL;DR

This paper introduces a calibrated metric to detect memorization and data leakage in MRI generative models, addressing privacy concerns by accurately identifying duplicated training images.

Contribution

It proposes a novel, robust per-sample metric using MRI features and multi-layer similarity aggregation to detect training data duplication.

Findings

01

The metric effectively detects duplicated images with near-perfect accuracy.

02

It provides consistent scores across different datasets and augmentation methods.

03

The approach enhances privacy protection in medical image generation.

Abstract

Image generative models are known to duplicate images from the training data as part of their outputs, which can lead to privacy concerns when used for medical image generation. We propose a calibrated per-sample metric for detecting memorization and duplication of training data. Our metric uses image features extracted using an MRI foundation model, aggregates multi-layer whitened nearest-neighbor similarities, and maps them to a bounded \emph{Overfit/Novelty Index} (ONI) and \emph{Memorization Index} (MI) scores. Across three MRI datasets with controlled duplication percentages and typical image augmentations, our metric robustly detects duplication and provides more consistent metric values across datasets. At the sample level, our metric achieves near-perfect detection of duplicates.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques · Digital Media Forensic Detection