Replicating Data Pipelines with GrimoireLab

Kalvin Eng; Hareem Sahar

arXiv:2205.02727·cs.SE·May 6, 2022

Replicating Data Pipelines with GrimoireLab

Kalvin Eng, Hareem Sahar

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates how GrimoireLab can replicate a Gitter data analysis pipeline, comparing its performance and usability to previous methods to guide future researchers in data pipeline implementation.

Contribution

The paper provides a practical comparison of GrimoireLab with existing pipelines for Gitter data, highlighting its advantages in speed, consistency, and ease of use.

Findings

01

GrimoireLab offers faster data processing than previous pipelines.

02

It maintains high data consistency and organization.

03

The learning curve for GrimoireLab is manageable for new users.

Abstract

In this paper, we present our MSR Hackathon 2022 project that replicates an existing Gitter study using GrimoireLab. We compare the previous study's pipeline with our GrimoireLab implementation in terms of speed, data consistency, organization, and the learning curve to get started. We believe our experience with GrimoireLab can help future researchers in making the right choice while implementing their data pipelines over Gitter and Github data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

k----n/GrimoireGitter
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsContext-Aware Activity Recognition Systems