Artificially Intelligent Opinion Polling
Roberto Cerina, Raymond Duch

TL;DR
This paper introduces a methodology for making representative public opinion inferences from unrepresentative, high-frequency social media samples using bias correction and LLMs for data extraction, achieving election prediction accuracy comparable to traditional polling.
Contribution
It presents a novel bias-corrected model for online sample selection and a protocol leveraging LLMs to extract survey-like data from social media, enabling accurate, timely opinion polling from unstructured data.
Findings
Bias-corrected model outperforms traditional MrP on biased samples.
LLMs accurately classify social media users' demographics and opinions.
AI-based polling estimates match high-quality traditional polls for the 2020 election.
Abstract
We seek to democratise public-opinion research by providing practitioners with a general methodology to make representative inference from cheap, high-frequency, highly unrepresentative samples. We focus specifically on samples which are readily available in moderate sizes. To this end, we provide two major contributions: 1) we introduce a general sample-selection process which we name online selection, and show it is a special-case of selection on the dependent variable. We improve MrP for severely biased samples by introducing a bias-correction term in the style of King and Zeng to the logistic-regression framework. We show this bias-corrected model outperforms traditional MrP under online selection, and achieves performance similar to random-sampling in a vast array of scenarios; 2) we present a protocol to use Large Language Models (LLMs) to extract structured, survey-like data from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling
