Blockchain-Enabled Accountability in Data Supply Chain: A Data Bill of Materials Approach
Yue Liu, Dawen Zhang, Boming Xia, Julia Anticev, Tunde Adebayo,, Zhenchang Xing, Moses Machao

TL;DR
This paper introduces a blockchain-based Data Bill of Materials (DataBOM) to enhance traceability, accountability, and governance of datasets in AI data supply chains, addressing challenges of complexity and stakeholder diversity.
Contribution
It adapts the Software Bill of Materials concept to data governance, proposing a blockchain-enabled platform and interaction protocol for managing dataset dependencies and metadata.
Findings
Demonstrates feasibility through a case study
Evaluates performance with quantitative analysis
Shows improved data traceability and accountability
Abstract
In the era of advanced artificial intelligence, highlighted by large-scale generative models like GPT-4, ensuring the traceability, verifiability, and reproducibility of datasets throughout their lifecycle is paramount for research institutions and technology companies. These organisations increasingly rely on vast corpora to train and fine-tune advanced AI models, resulting in intricate data supply chains that demand effective data governance mechanisms. In addition, the challenge intensifies as diverse stakeholders may use assorted tools, often without adequate measures to ensure the accountability of data and the reliability of outcomes. In this study, we adapt the concept of ``Software Bill of Materials" into the field of data governance and management to address the above challenges, and introduce ``Data Bill of Materials" (DataBOM) to capture the dependency relationship between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlockchain Technology Applications and Security · Cloud Data Security Solutions · Privacy-Preserving Technologies in Data
MethodsLinear Layer · Residual Connection · Layer Normalization · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Attention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Softmax
