"I see models being a whole other thing": An Empirical Study of Pre-Trained Model Naming Conventions and A Tool for Enhancing Naming Consistency
Wenxin Jiang, Mingyu Kim, Chingwo Cheung, Heesoo Kim, George K. Thiruvathukal, James C. Davis

TL;DR
This paper empirically studies naming practices of pre-trained models on Hugging Face, revealing mismatches with user preferences, and introduces DARA, an automated tool that detects naming inconsistencies with high accuracy to improve model reuse and security.
Contribution
It provides the first systematic analysis of PTM naming conventions and introduces DARA, a novel automated technique for detecting naming inconsistencies based on architectural information.
Findings
Survey shows mismatch between user preferences and current naming practices.
DARA achieves 94% accuracy in identifying model types.
Potential applications include model validation and plagiarism detection.
Abstract
As innovation in deep learning continues, many engineers are incorporating Pre-Trained Models (PTMs) as components in computer systems. Some PTMs are foundation models, and others are fine-tuned variations adapted to different needs. When these PTMs are named well, it facilitates model discovery and reuse. However, prior research has shown that model names are not always well chosen and can sometimes be inaccurate and misleading. The naming practices for PTM packages have not been systematically studied, which hampers engineers' ability to efficiently search for and reliably reuse these models. In this paper, we conduct the first empirical investigation of PTM naming practices in the Hugging Face PTM registry. We begin by reporting on a survey of 108 Hugging Face users, highlighting differences from traditional software package naming and presenting findings on PTM naming practices. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Software Engineering Research · Software Testing and Debugging Techniques
