Curating Subject ID Labels using Keypoint Signatures

Laurent Chauvin; Matthew Toews

arXiv:2110.04055·cs.CV·October 11, 2021

Curating Subject ID Labels using Keypoint Signatures

Laurent Chauvin, Matthew Toews

PDF

Open Access

TL;DR

This paper introduces an efficient system for verifying and correcting subject ID labels in large medical image datasets using 3D keypoint signatures, reducing errors and improving data integrity.

Contribution

The paper presents a novel keypoint signature-based method for curating and error-detecting subject ID labels in large-scale medical imaging datasets.

Findings

01

Discovered unknown labeling errors in public brain MRI datasets

02

Improved accuracy of subject identification in medical images

03

Enhanced dataset reliability for machine learning applications

Abstract

Subject ID labels are unique, anonymized codes that can be used to group all images of a subject while maintaining anonymity. ID errors may be inadvertently introduced manually error during enrollment and may lead to systematic error into machine learning evaluation (e.g. due to double-dipping) or potential patient misdiagnosis in clinical contexts. Here we describe a highly efficient system for curating subject ID labels in large generic medical image datasets, based on the 3D image keypoint representation, which recently led to the discovery of previously unknown labeling errors in widely-used public brain MRI datasets

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Graph Theory and Algorithms · Data Quality and Management