Aggregated Individual Reporting for Post-Deployment Evaluation
Jessica Dai, Inioluwa Deborah Raji, Benjamin Recht, Irene Y. Chen

TL;DR
This paper proposes Aggregated Individual Reporting (AIR), a framework for post-deployment AI evaluation that collects and aggregates user reports to identify safety issues and inform improvements, promoting democratic oversight.
Contribution
It introduces AIR as a novel mechanism for integrating individual user reports into systematic post-deployment AI evaluation, emphasizing democratic participation.
Findings
AIR can reveal safety and performance insights not captured by static benchmarks.
Aggregation of reports helps prioritize issues for system improvement.
The framework supports more transparent and accountable AI deployment.
Abstract
The need for developing model evaluations beyond static benchmarking, especially in the post-deployment phase, is now well-understood. At the same time, concerns about the concentration of power in deployed AI systems have sparked a keen interest in 'democratic' or 'public' AI. In this work, we bring these two ideas together by proposing mechanisms for aggregated individual reporting (AIR), a framework for post-deployment evaluation that relies on individual reports from the public. An AIR mechanism allows those who interact with a specific, deployed (AI) system to report when they feel that they may have experienced something problematic; these reports are then aggregated over time, with the goal of evaluating the relevant system in a fine-grained manner. This position paper argues that individual experiences should be understood as an integral part of post-deployment evaluation, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
