TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
Wiebke Hutiri, Mircea Cimpoi, Morgan Scheuerman, Victoria Matthews, Alice Xiang

TL;DR
This paper introduces TEDI, a comprehensive set of 143 indicators for systematically analyzing and comparing the trustworthy and ethical attributes of multimodal dataset documentation, aiming to improve transparency in AI datasets.
Contribution
The paper presents TEDI, a novel framework with detailed indicators for empirical assessment of dataset documentation's ethical and trustworthy aspects, supported by analysis of over 100 multimodal datasets.
Findings
Few datasets document consent, privacy, and harmful content indicators.
Documentation quality varies with data collection methods.
Scraping is common but less ethical, while direct collection often includes more ethical indicators.
Abstract
Dataset transparency is a key enabler of responsible AI, but insights into multimodal dataset attributes that impact trustworthy and ethical aspects of AI applications remain scarce and are difficult to compare across datasets. To address this challenge, we introduce Trustworthy and Ethical Dataset Indicators (TEDI) that facilitate the systematic, empirical analysis of dataset documentation. TEDI encompasses 143 fine-grained indicators that characterize trustworthy and ethical attributes of multimodal datasets and their collection processes. The indicators are framed to extract verifiable information from dataset documentation. Using TEDI, we manually annotated and analyzed over 100 multimodal datasets that include human voices. We further annotated data sourcing, size, and modality details to gain insights into the factors that shape trustworthy and ethical dimensions across datasets.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
