Deep Learning Model Reuse in the HuggingFace Community: Challenges, Benefit and Trends
Mina Taraghi, Gianolli Dorcelus, Armstrong Foundjem, Florian Tambon,, Foutse Khomh

TL;DR
This paper investigates the challenges, benefits, and trends of reusing large-scale pre-trained models in the HuggingFace community through empirical analysis of forums and model hubs, revealing key issues and evolving patterns.
Contribution
It provides the first comprehensive taxonomy of PTM reuse challenges and benefits, along with quantitative insights into model trends and documentation evolution over time.
Findings
Common challenges include limited guidance for beginners and difficulty understanding model outputs.
Some models maintain high upload rates despite declining related topics.
Model documentation quantity has not increased over time, hindering user comprehension.
Abstract
The ubiquity of large-scale Pre-Trained Models (PTMs) is on the rise, sparking interest in model hubs, and dedicated platforms for hosting PTMs. Despite this trend, a comprehensive exploration of the challenges that users encounter and how the community leverages PTMs remains lacking. To address this gap, we conducted an extensive mixed-methods empirical study by focusing on discussion forums and the model hub of HuggingFace, the largest public model hub. Based on our qualitative analysis, we present a taxonomy of the challenges and benefits associated with PTM reuse within this community. We then conduct a quantitative study to track model-type trends and model documentation evolution over time. Our findings highlight prevalent challenges such as limited guidance for beginner users, struggles with model output comprehensibility in training or inference, and a lack of model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software System Performance and Reliability · Model-Driven Software Engineering Techniques
