An Empirical Framework for Evaluating Semantic Preservation Using Hugging Face
Nan Jia, Anita Raja, Raffi Khatchadourian

TL;DR
This paper presents an empirical framework that leverages Hugging Face data to evaluate semantic preservation in machine learning models, helping ensure trustworthiness and detect semantic drift during model evolution.
Contribution
It introduces a large-scale dataset and a practical pipeline for assessing semantic preservation across ML model versions, supported by empirical case studies.
Findings
Semantic drift can be detected via evaluation metrics across commits
Common refactoring patterns are identified through commit message analysis
The pipeline provides a foundation for defining community standards for semantic preservation
Abstract
As machine learning (ML) becomes an integral part of high-autonomy systems, it is critical to ensure the trustworthiness of learning-enabled software systems (LESS). Yet, the nondeterministic and run-time-defined semantics of ML complicate traditional software refactoring. We define semantic preservation in LESS as the property that optimizations of intelligent components do not alter the system's overall functional behavior. This paper introduces an empirical framework to evaluate semantic preservation in LESS by mining model evolution data from HuggingFace. We extract commit histories, , and performance metrics from a large number of models. To establish baselines, we conducted case studies in three domains, tracing performance changes across versions. Our analysis demonstrates how can be detected via evaluation metrics across commits…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Software Testing and Debugging Techniques · Software Engineering Research
