Open Datasets in Learning Analytics: Trends, Challenges, and Best PRACTICE
Valdemar \v{S}v\'abensk\'y, Brendan Flanagan, Erwin Daniel L\'opez Zapata, Atsushi Shimada

TL;DR
This paper surveys 172 open datasets in learning analytics, analyzing their use, gaps, and providing guidelines to promote open data practices in the field.
Contribution
It offers the most comprehensive collection and analysis of open educational datasets to date, including a detailed categorization and practical guidelines for researchers.
Findings
Identified 172 datasets used in 204 publications.
Most datasets were not previously captured in surveys.
Provided a checklist and recommendations to improve open data sharing.
Abstract
Open datasets play a crucial role in three research domains that intersect data science and education: learning analytics, educational data mining, and artificial intelligence in education. Researchers in these domains apply computational methods to analyze data from educational contexts, aiming to better understand and improve teaching and learning. Providing open datasets alongside research papers supports reproducibility, collaboration, and trust in research findings. It also provides individual benefits for authors, such as greater visibility, credibility, and citation potential. Despite these advantages, the availability of open datasets and the associated practices within the learning analytics research communities, especially at their flagship conference venues, remain unclear. We surveyed available datasets published alongside research papers in learning analytics. We manually…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
