On the Suitability of Hugging Face Hub for Empirical Studies

Adem Ait; Javier Luis C\'anovas Izquierdo; Jordi Cabot

arXiv:2307.14841·cs.SE·July 28, 2023·1 cites

On the Suitability of Hugging Face Hub for Empirical Studies

Adem Ait, Javier Luis C\'anovas Izquierdo, Jordi Cabot

PDF

Open Access

TL;DR

This paper explores the potential of Hugging Face Hub as a data source for empirical software engineering studies, comparing its features and analyzing its data to assess its suitability.

Contribution

It provides an initial evaluation of Hugging Face Hub's features and data for empirical research, filling a gap in understanding its research potential.

Findings

01

HFH has unique features compared to GitHub and GitLab.

02

The data in HFH shows promise for empirical studies.

03

Further analysis is needed to confirm its research suitability.

Abstract

Background. The development of empirical studies in software engineering mainly relies on the data available on code hosting platforms, being GitHub the most representative. Nevertheless, in the last years, the emergence of Machine Learning (ML) has led to the development of platforms specifically designed for developing ML-based projects, being Hugging Face Hub (HFH) the most popular one. With over 250k repositories, and growing fast, HFH is becoming a promising ecosystem of ML artifacts and therefore a potential source of data for empirical studies. However, so far there have been no studies evaluating the potential of HFH for such studies. Objective. In this proposal for a registered report, we aim at performing an exploratory study of the current state of HFH in order to investigate its suitability to be used as a source platform for empirical studies. Method. We conduct a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Software Engineering Research · Big Data and Business Intelligence