Evaluating Structured Documentation as a Tool for Reflexivity in Dataset Development
Eshta Bhardwaj, Ciara Zogheib, Christoph Becker

TL;DR
This paper investigates how well structured dataset documentation frameworks incorporate reflexivity concepts, finding a general lack of engagement and proposing strategies to improve reflexivity in dataset development.
Contribution
It empirically analyzes existing frameworks and literature, identifying gaps in reflexivity integration, and offers actionable recommendations and extended questions for better incorporation.
Findings
Lack of engagement with major reflexivity themes in frameworks and applications
Proposed extended datasheet questions to enhance reflexivity incorporation
Presented a codebook of major reflexivity topics
Abstract
It is prominently recognized that dataset development in machine learning is a value-laden process from problem formulation to data processing, use, and reuse. Structured documentation frameworks such as datasheets, data statements, and dataset nutrition labels have been created to aid developers in documenting how their datasets were produced and, according to the creators of the frameworks, to facilitate reflexivity in dataset development. While reflexivity is a stated goal, it is unclear whether and to what extent these structured dataset documentation frameworks incorporate concepts from reflexivity literature (at FAccT and elsewhere) and whether the use of the frameworks demonstrates reflexivity. Here, we adopt mixed-method thematic analysis and corpus-assisted discourse analysis to explore how reflexivity is incorporated in structured documentation frameworks and their responses.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
