Replicating Data Pipelines with GrimoireLab
Kalvin Eng, Hareem Sahar

TL;DR
This paper demonstrates how GrimoireLab can replicate a Gitter data analysis pipeline, comparing its performance and usability to previous methods to guide future researchers in data pipeline implementation.
Contribution
The paper provides a practical comparison of GrimoireLab with existing pipelines for Gitter data, highlighting its advantages in speed, consistency, and ease of use.
Findings
GrimoireLab offers faster data processing than previous pipelines.
It maintains high data consistency and organization.
The learning curve for GrimoireLab is manageable for new users.
Abstract
In this paper, we present our MSR Hackathon 2022 project that replicates an existing Gitter study using GrimoireLab. We compare the previous study's pipeline with our GrimoireLab implementation in terms of speed, data consistency, organization, and the learning curve to get started. We believe our experience with GrimoireLab can help future researchers in making the right choice while implementing their data pipelines over Gitter and Github data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems
