The Value of Context: Human versus Black Box Evaluators
Andrei Iakovlev, Annie Liang

TL;DR
This paper compares human and machine evaluations, highlighting the importance of context and customization in assessments, and finds that in high-dimensional data environments, the advantage of additional covariates often surpasses the benefit of contextual customization.
Contribution
The paper introduces a framework to analyze the trade-off between standardized machine evaluations and context-dependent human assessments in high-dimensional settings.
Findings
Customization provides significant value in high-dimensional data environments.
Additional covariates generally outperform contextual evaluation unless joint distribution is well-known.
Framework clarifies when human versus machine evaluation is preferable.
Abstract
Machine learning algorithms are now capable of performing evaluations previously conducted by human experts (e.g., medical diagnoses). How should we conceptualize the difference between evaluation by humans and by algorithms, and when should an individual prefer one over the other? We propose a framework to examine one key distinction between the two forms of evaluation: Machine learning algorithms are standardized, fixing a common set of covariates by which to assess all individuals, while human evaluators customize which covariates are acquired to each individual. Our framework defines and analyzes the advantage of this customization -- the value of context -- in environments with high-dimensional data. We show that unless the agent has precise knowledge about the joint distribution of covariates, the benefit of additional covariates generally outweighs the value of context.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvaluation and Performance Assessment
MethodsSparse Evolutionary Training
