Cataloguing Hugging Face Models to Software Engineering Activities: Automation and Findings
Alexandra Gonz\'alez, Xavier Franch, David Lo, Silverio Mart\'inez-Fern\'andez

TL;DR
This paper creates a comprehensive taxonomy and catalog of 2,205 Hugging Face pre-trained models tailored for Software Engineering tasks, revealing usage patterns, gaps, and opportunities for automation in model selection.
Contribution
It introduces a novel SE-specific classification of PTMs, a curated catalog, and insights into their application trends and evaluation limitations.
Findings
Most models target code generation and coding tasks.
Text generation is the dominant ML task among SE PTMs.
Limited evaluation reports, with only 9.6% providing benchmark results.
Abstract
Context: Open-source Pre-Trained Models (PTMs) provide extensive resources for various Machine Learning (ML) tasks, yet these resources lack a classification tailored to Software Engineering (SE) needs to support the reliable identification and reuse of models for SE. Objective: To address this gap, we derive a taxonomy encompassing 147 SE tasks and apply an SE-oriented classification to PTMs in a popular open-source ML repository, Hugging Face (HF). Method: Our repository mining study followed a five-phase pipeline: (i) identification SE tasks from the literature; (ii) collection of PTM data from the HF API, including model card descriptions and metadata, and the abstracts of the associated arXiv papers; (iii) text processing to ensure consistency; (iv) a two-phase validation of SE relevance, involving humans and LLM assistance, supported by five pilot studies with human annotators and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel-Driven Software Engineering Techniques · Software System Performance and Reliability · Software Engineering Techniques and Practices
