CaTE Data Curation for Trustworthy AI
Mary Versa Clemens-Sewall, Christopher Cervantes, Emma Rafkin, J. Neil Otte, Tom Magelinski, Libby Lewis, Michelle Liu, Dana Udwin, Monique Kirkman-Bey

TL;DR
This paper offers practical guidance and a structured approach for data scientists to enhance trustworthiness in AI systems through effective data curation, integrating tools, methods, and best practices from academic literature.
Contribution
It provides a comprehensive, step-by-step framework for data curation aimed at improving AI trustworthiness, including analysis of tools, approaches, and open-source implementations.
Findings
Structured data curation steps for trustworthy AI
Evaluation of open-source tools for data quality and bias mitigation
Guidelines for integrating trustworthiness practices into AI development
Abstract
This report provides practical guidance to teams designing or developing AI-enabled systems for how to promote trustworthiness during the data curation phase of development. In this report, the authors first define data, the data curation phase, and trustworthiness. We then describe a series of steps that the development team, especially data scientists, can take to build a trustworthy AI-enabled system. We enumerate the sequence of core steps and trace parallel paths where alternatives exist. The descriptions of these steps include strengths, weaknesses, preconditions, outcomes, and relevant open-source software tool implementations. In total, this report is a synthesis of data curation tools and approaches from relevant academic literature, and our goal is to equip readers with a diverse yet coherent set of practices for improving AI trustworthiness.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cloud Data Security Solutions · IoT and Edge/Fog Computing
