Fine-Grained Traceability for Transparent ML Pipelines
Liping Chen, Mujie Liu, Haytham Fayek

TL;DR
FG-Trac is a novel, model-agnostic framework that provides verifiable, fine-grained traceability of individual data samples throughout machine learning pipelines, enhancing transparency and accountability.
Contribution
It introduces a comprehensive mechanism for sample-level traceability, integrating cryptographic commitments and contribution scoring without altering existing models.
Findings
Preserves predictive performance while enabling traceability
Provides verifiable evidence of sample usage and propagation
Works with diverse ML pipeline architectures
Abstract
Modern machine learning systems are increasingly realised as multistage pipelines, yet existing transparency mechanisms typically operate at a model level: they describe what a system is and why it behaves as it does, but not how individual data samples are operationally recorded, tracked, and verified as they traverse the pipeline. This absence of verifiable, sample-level traceability leaves practitioners and users unable to determine whether a specific sample was used, when it was processed, or whether the corresponding records remain intact over time. We introduce FG-Trac, a model-agnostic framework that establishes verifiable, fine-grained sample-level traceability throughout machine learning pipelines. FG-Trac defines an explicit mechanism for capturing and verifying sample lifecycle events across preprocessing and training, computes contribution scores explicitly grounded in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Ethics and Social Impacts of AI
