Insights from Publishing Open Data in Industry-Academia Collaboration
Per Erik Strandberg, Philipp Peterseil, Julian Karoliny, Johanna, Kallio, Johannes Peltola

TL;DR
This paper investigates the challenges, motivations, and lessons learned from publishing open data in industry-academia collaborations, highlighting the importance of planning, licensing awareness, and the value of synthetic data.
Contribution
It provides empirical insights into open data practices in industry-academia collaborations, emphasizing planning, licensing, and the significance of synthetic data for research.
Findings
Few datasets (2.4%) included reuse scripts.
Authors often lack awareness of licensing importance.
Synthetic data can be highly meaningful for research.
Abstract
Effective data management and sharing are critical success factors in industry-academia collaboration. This paper explores the motivations and lessons learned from publishing open data sets in such collaborations. Through a survey of participants in a European research project that published 13 data sets, and an analysis of metadata from almost 281 thousand datasets in Zenodo, we collected qualitative and quantitative results on motivations, achievements, research questions, licences and file types. Through inductive reasoning and statistical analysis we found that planning the data collection is essential, and that only few datasets (2.4%) had accompanying scripts for improved reuse. We also found that authors are not well aware of the importance of licences or which licence to choose. Finally, we found that data with a synthetic origin, collected with simulations and potentially mixed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
