ElicitationGPT: Text Elicitation Mechanisms via Language Models
Yifan Wu, Jason Hartline

TL;DR
This paper introduces ElicitationGPT, a novel mechanism that uses large language models to evaluate and score textual forecasts against ground truth, ensuring proper incentives and aligning with human preferences.
Contribution
It develops a domain-knowledge-free, provably proper scoring mechanism for text elicitation using black-box language models, validated through empirical evaluation.
Findings
Mechanism aligns well with human preferences in peer review evaluation
Provably proper scoring achieved via black-box language models
Empirical results demonstrate effectiveness in real-world peer grading
Abstract
Scoring rules evaluate probabilistic forecasts of an unknown state against the realized state and are a fundamental building block in the incentivized elicitation of information. This paper develops mechanisms for scoring elicited text against ground truth text by reducing the textual information elicitation problem to a forecast elicitation problem, via domain-knowledge-free queries to a large language model (specifically ChatGPT), and empirically evaluates their alignment with human preferences. Our theoretical analysis shows that the reduction achieves provable properness via black-box language models. The empirical evaluation is conducted on peer reviews from a peer-grading dataset, in comparison to manual instructor scores for the peer reviews. Our results suggest a paradigm of algorithmic artificial intelligence that may be useful for developing artificial intelligence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
