An Empirical Framework for Evaluating Semantic Preservation Using Hugging Face

Nan Jia; Anita Raja; Raffi Khatchadourian

arXiv:2512.07983·cs.SE·December 10, 2025

An Empirical Framework for Evaluating Semantic Preservation Using Hugging Face

Nan Jia, Anita Raja, Raffi Khatchadourian

PDF

Open Access

TL;DR

This paper presents an empirical framework that leverages Hugging Face data to evaluate semantic preservation in machine learning models, helping ensure trustworthiness and detect semantic drift during model evolution.

Contribution

It introduces a large-scale dataset and a practical pipeline for assessing semantic preservation across ML model versions, supported by empirical case studies.

Findings

01

Semantic drift can be detected via evaluation metrics across commits

02

Common refactoring patterns are identified through commit message analysis

03

The pipeline provides a foundation for defining community standards for semantic preservation

Abstract

As machine learning (ML) becomes an integral part of high-autonomy systems, it is critical to ensure the trustworthiness of learning-enabled software systems (LESS). Yet, the nondeterministic and run-time-defined semantics of ML complicate traditional software refactoring. We define semantic preservation in LESS as the property that optimizations of intelligent components do not alter the system's overall functional behavior. This paper introduces an empirical framework to evaluate semantic preservation in LESS by mining model evolution data from HuggingFace. We extract commit histories, $Model Cards$ , and performance metrics from a large number of models. To establish baselines, we conducted case studies in three domains, tracing performance changes across versions. Our analysis demonstrates how $semantic drift$ can be detected via evaluation metrics across commits…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Software Testing and Debugging Techniques · Software Engineering Research