Structuring data analysis projects in the Open Science era with Kerblam!
Luca Visentin, Luca Munaron, Federico Alessandro Ruffinatti

TL;DR
This paper examines the variability in data analysis project structures, proposes guiding principles for creating more standardized projects, and introduces Kerblam!, a tool to improve project management and sharing.
Contribution
It identifies the lack of standardization in project structures, proposes guiding principles, and presents Kerblam! to enhance transparency and collaboration in data analysis workflows.
Findings
Little structural overlap among existing templates
Guiding principles can standardize project creation
Kerblam! improves project management and sharing
Abstract
Structuring data analysis projects, that is, defining the layout of files and folders needed to analyze data using existing tools and novel code, largely follows personal preferences. In this work, we look at the structure of several data analysis project templates and find little structural overlap. We highlight the parts that are similar between them, and propose guiding principles to keep in mind when one wishes to create a new data analysis project. Finally, we present Kerblam!, a project management tool that can expedite project data management, execution of workflow managers, and sharing of the resulting workflow and analysis outputs. We hope that, by following these principles and using Kerblam!, the landscape of data analysis projects can become more transparent, understandable, and ultimately useful to the wider community.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data Technologies and Applications
