Reliable Evaluation of MRI Motion Correction: Dataset and Insights

Kun Wang; Tobit Klug; Stefan Ruschke; Jan S. Kirschke; Reinhard Heckel

arXiv:2506.05975·eess.IV·June 9, 2025

Reliable Evaluation of MRI Motion Correction: Dataset and Insights

Kun Wang, Tobit Klug, Stefan Ruschke, Jan S. Kirschke, Reinhard Heckel

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a new dataset and metric for evaluating MRI motion correction methods, highlighting the advantages of real-world evaluation combined with a feature-space metric over simulated or reference-free approaches.

Contribution

It provides PMoC3D, a dataset of real-world motion-corrupted MRI scans, and MoMRISim, a new feature-space metric for more reliable evaluation of correction methods.

Findings

01

Real-world evaluation with MoMRISim is most reliable.

02

Simulated motion evaluation overestimates performance.

03

Reference-free evaluation tends to overrate deep learning outputs.

Abstract

Correcting motion artifacts in MRI is important, as they can hinder accurate diagnosis. However, evaluating deep learning-based and classical motion correction methods remains fundamentally difficult due to the lack of accessible ground-truth target data. To address this challenge, we study three evaluation approaches: real-world evaluation based on reference scans, simulated motion, and reference-free evaluation, each with its merits and shortcomings. To enable evaluation with real-world motion artifacts, we release PMoC3D, a dataset consisting of unprocessed Paired Motion-Corrupted 3D brain MRI data. To advance evaluation quality, we introduce MoMRISim, a feature-space metric trained for evaluating motion reconstructions. We assess each evaluation approach and find real-world evaluation together with MoMRISim, while not perfect, to be most reliable. Evaluation based on simulated…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 5

Strengths

- **Well-scoped and clearly structured problem framing.** The paper provides a well-defined problem setup and a thoughtful taxonomy covering paired, simulated, and reference-free evaluations. It clearly explains why obtaining ground truth is inherently difficult in 3D MRI, which makes the motivation convincing and the study design logical. - **PMoC3D: a valuable paired 3D dataset with raw k-space.** The authors present PMoC3D, a paired dataset that includes raw multi-coil k-space data, coil

Weaknesses

**Limited dataset size and missing motion trajectory data.** The dataset PMoC3D is an important contribution, but there are some limitations. First, the sample size is quite small, containing only eight subjects. Second, although the dataset includes rich metadata such as raw k-space, motion instructions, and timestamps, it lacks **true motion trajectories**, meaning the exact six-degree-of-freedom head poses over time. Including this information would greatly enhance the dataset’s scientific

Reviewer 02Rating 6Confidence 5

Strengths

- The collected PMOC3D dataset includes paired motion-free and motion-corrupted 3D k-space raw data, covering diverse motion types. Its construction and release can enable the development of more advanced deep learning–based MoCo methods, which is a critical contribution. - The work explores many quantitative evaluation metrics for MoCo reconstructions, including reference-based metrics (e.g., pixel-level SSIM and PSNR, feature-level DreamSim) and reference-free metrics (e.g., AES, TG, and VIM s

Weaknesses

- The PMOC3D dataset only includes 8 subjects. While I understand that MRI data collection is costly and time-consuming, this number is still relatively small. Therefore, the dataset may be more suitable as a test set rather than for training deep learning models. - To my knowledge, the combination of deep learning models and physics-based iterative reconstruction is currently a mainstream paradigm for MRI MoCo [1][2][3]. These methods typically involve motion trajectory estimation. However, the

Reviewer 03Rating 2Confidence 4

Strengths

- **Valuable dataset**: The creation of a dataset with real MRI scans acquired at different, controlled levels of motion artefact, including a motion-free reference for each subject, is a significant strength. Such data is difficult and costly to obtain, and it provides a valuable resource for developing and rigorously evaluating motion detection and correction algorithms. - **Novel and relevant metrics**: The introduction of two new metrics—a tailored reference-based metric (MoMRISim) and a nov

Weaknesses

- **Poor clarity and structure**: The paper's organisation significantly hinders comprehension. The introduction lacks a foundational explanation of motion artefacts, their impact, and existing detection/correction methods. Key details, such as the derivation of the "perceived motion artefact score" (Section 2.2), are insufficiently explained in the main text. Section 3, which covers evaluation approaches, confusingly mixes descriptions of existing and proposed solutions without clear distinctio

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced MRI Techniques and Applications · Functional Brain Connectivity Studies · Sparse and Compressive Sensing Techniques