A Large-Scale Exploit Instrumentation Study of AI/ML Supply Chain   Attacks in Hugging Face Models

Beatrice Casey; Joanna C. S. Santos; Mehdi Mirakhorli

arXiv:2410.04490·cs.CR·October 8, 2024

A Large-Scale Exploit Instrumentation Study of AI/ML Supply Chain Attacks in Hugging Face Models

Beatrice Casey, Joanna C. S. Santos, Mehdi Mirakhorli

PDF

Open Access

TL;DR

This study examines the widespread use of unsafe serialization methods in Hugging Face models, demonstrating how these vulnerabilities can be exploited and assessing the platform's ability to detect such risks.

Contribution

It provides the first large-scale analysis of unsafe serialization in Hugging Face models and develops techniques to identify malicious or vulnerable models.

Findings

01

Many models use unsafe serialization methods

02

Hugging Face has limited detection of vulnerable models

03

Exploitation of unsafe serialization can compromise ML environments

Abstract

The development of machine learning (ML) techniques has led to ample opportunities for developers to develop and deploy their own models. Hugging Face serves as an open source platform where developers can share and download other models in an effort to make ML development more collaborative. In order for models to be shared, they first need to be serialized. Certain Python serialization methods are considered unsafe, as they are vulnerable to object injection. This paper investigates the pervasiveness of these unsafe serialization methods across Hugging Face, and demonstrates through an exploitation approach, that models using unsafe serialization methods can be exploited and shared, creating an unsafe environment for ML developers. We investigate to what extent Hugging Face is able to flag repositories and files using unsafe serialization methods, and develop a technique to detect…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Advanced Malware Detection Techniques · Adversarial Robustness in Machine Learning