HuggingGraph: Understanding the Supply Chain of LLM Ecosystem

Mohammad Shahedur Rahman; Peng Gao; Yuede Ji

arXiv:2507.14240·cs.CL·September 8, 2025

HuggingGraph: Understanding the Supply Chain of LLM Ecosystem

Mohammad Shahedur Rahman, Peng Gao, Yuede Ji

PDF

Open Access

TL;DR

This paper introduces HuggingGraph, a methodology and graph-based model to analyze the supply chain of large language models, revealing relationships and potential risks inherited from datasets and previous models.

Contribution

It presents a novel graph-based approach to systematically analyze the supply chain of LLMs, including a large heterogeneous graph with over 400,000 nodes.

Findings

01

Identified complex relationships between models and datasets.

02

Revealed potential vulnerabilities inherited from data sources.

03

Provided insights for improving model fairness and compliance.

Abstract

Large language models (LLMs) leverage deep learning architectures to process and predict sequences of words, enabling them to perform a wide range of natural language processing tasks, such as translation, summarization, question answering, and content generation. As existing LLMs are often built from base models or other pre-trained models and use external datasets, they can inevitably inherit vulnerabilities, biases, or malicious components that exist in previous models or datasets. Therefore, it is critical to understand these components' origin and development process to detect potential risks, improve model fairness, and ensure compliance with regulatory frameworks. Motivated by that, this project aims to study such relationships between models and datasets, which are the central parts of the LLM supply chain. First, we design a methodology to systematically collect LLMs' supply…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSustainable Industrial Ecology · Scientific Computing and Data Management · Blockchain Technology Applications and Security