Curating Subject ID Labels using Keypoint Signatures
Laurent Chauvin, Matthew Toews

TL;DR
This paper introduces an efficient system for verifying and correcting subject ID labels in large medical image datasets using 3D keypoint signatures, reducing errors and improving data integrity.
Contribution
The paper presents a novel keypoint signature-based method for curating and error-detecting subject ID labels in large-scale medical imaging datasets.
Findings
Discovered unknown labeling errors in public brain MRI datasets
Improved accuracy of subject identification in medical images
Enhanced dataset reliability for machine learning applications
Abstract
Subject ID labels are unique, anonymized codes that can be used to group all images of a subject while maintaining anonymity. ID errors may be inadvertently introduced manually error during enrollment and may lead to systematic error into machine learning evaluation (e.g. due to double-dipping) or potential patient misdiagnosis in clinical contexts. Here we describe a highly efficient system for curating subject ID labels in large generic medical image datasets, based on the 3D image keypoint representation, which recently led to the discovery of previously unknown labeling errors in widely-used public brain MRI datasets
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Graph Theory and Algorithms · Data Quality and Management
