[Citation needed] Data usage and citation practices in medical imaging conferences
Th\'eo Sourget, Ahmet Akko\c{c}, Stinna Winther, Christine Lyngbye, Galsgaard, Amelia Jim\'enez-S\'anchez, Dovile Juodelyte, Caroline Petitjean,, Veronika Cheplygina

TL;DR
This paper introduces two open-source tools for detecting dataset usage in medical imaging research papers and analyzes dataset citation and mention practices over a decade, revealing concentration and variability in citation behaviors.
Contribution
The authors developed and applied novel tools for automated detection of dataset references, providing insights into dataset usage and citation practices in medical imaging conferences.
Findings
Limited diversity in dataset usage among papers.
Different citation practices complicate tracking dataset references.
Dataset usage has become more concentrated over time.
Abstract
Medical imaging papers often focus on methodology, but the quality of the algorithms and the validity of the conclusions are highly dependent on the datasets used. As creating datasets requires a lot of effort, researchers often use publicly available datasets, there is however no adopted standard for citing the datasets used in scientific papers, leading to difficulty in tracking dataset usage. In this work, we present two open-source tools we created that could help with the detection of dataset usage, a pipeline \url{https://github.com/TheoSourget/Public_Medical_Datasets_References} using OpenAlex and full-text analysis, and a PDF annotation software \url{https://github.com/TheoSourget/pdf_annotator} used in our study to manually label the presence of datasets. We applied both tools on a study of the usage of 20 publicly available medical datasets in papers from MICCAI and MIDL. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · scientometrics and bibliometrics research
MethodsSparse Evolutionary Training · Focus
