Designing Disaggregated Evaluations of AI Systems: Choices, Considerations, and Tradeoffs
Solon Barocas, Anhong Guo, Ece Kamar, Jacquelyn Krones, Meredith, Ringel Morris, Jennifer Wortman Vaughan, Duncan Wadsworth, Hanna Wallach

TL;DR
This paper discusses the complex design choices involved in creating disaggregated evaluations of AI systems, emphasizing how these choices impact results, interpretations, and societal effects, and advocates for better documentation of these decisions.
Contribution
It highlights key considerations and tradeoffs in designing disaggregated AI evaluations and promotes transparent documentation to improve interpretation and societal impact understanding.
Findings
Design choices significantly influence evaluation outcomes.
Transparent documentation aids interpretation and societal impact assessment.
Understanding tradeoffs improves evaluation robustness.
Abstract
Disaggregated evaluations of AI systems, in which system performance is assessed and reported separately for different groups of people, are conceptually simple. However, their design involves a variety of choices. Some of these choices influence the results that will be obtained, and thus the conclusions that can be drawn; others influence the impacts -- both beneficial and harmful -- that a disaggregated evaluation will have on people, including the people whose data is used to conduct the evaluation. We argue that a deeper understanding of these choices will enable researchers and practitioners to design careful and conclusive disaggregated evaluations. We also argue that better documentation of these choices, along with the underlying considerations and tradeoffs that have been made, will help others when interpreting an evaluation's results and conclusions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
