Guidelines and Best Practices to Share Deidentified Data and Code
Nicholas J. Horton, Sara Stoudt

TL;DR
This paper discusses the importance of sharing deidentified data and code to enhance reproducibility and open science, providing guidelines and best practices for authors to facilitate data sharing in educational research.
Contribution
It offers a comprehensive review of data and code sharing policies, trends, and practical advice tailored for authors in the context of educational statistics and data science.
Findings
Sharing policies increase reproducibility and data reuse.
Authors benefit from clear guidelines on data and code sharing.
Open science practices are evolving with new sharing options.
Abstract
In 2022, the Journal of Statistics and Data Science Education (JSDSE) instituted augmented requirements for authors to post deidentified data and code underlying their papers. These changes were prompted by an increased focus on reproducibility and open science (NASEM 2019). A recent review of data availability practices noted that "such policies help increase the reproducibility of the published literature, as well as make a larger body of data available for reuse and re-analysis" (PLOS ONE, 2024). JSDSE values accessibility as it endeavors to share knowledge that can improve educational approaches to teaching statistics and data science. Because institution, environment, and students differ across readers of the journal, it is especially important to facilitate the transfer of a journal article's findings to new contexts. This process may require digging into more of the details,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
