TL;DR
LEAP is an end-to-end library that enables social scientists to efficiently analyze unstructured data like Tweets by automatically extracting semantic information and answering natural language queries using machine learning.
Contribution
The paper introduces LEAP, a novel system that automates the process of answering social science queries on unstructured data with ML, handling vagueness and cost-effective code generation.
Findings
Achieves 100% pass @ 3 and 92% pass @ 1 on QUIET-ML dataset.
Cost-effective with an average end-to-end cost of $1.06 per query.
Successfully extends unstructured data analysis for social science applications.
Abstract
Social scientists are increasingly interested in analyzing the semantic information (e.g., emotion) of unstructured data (e.g., Tweets), where the semantic information is not natively present. Performing this analysis in a cost-efficient manner requires using machine learning (ML) models to extract the semantic information and subsequently analyze the now structured data. However, this process remains challenging for domain experts. To demonstrate the challenges in social science analytics, we collect a dataset, QUIET-ML, of 120 real-world social science queries in natural language and their ground truth answers. Existing systems struggle with these queries since (1) they require selecting and applying ML models, and (2) more than a quarter of these queries are vague, making standard tools like natural language to SQL systems unsuited. To address these issues, we develop LEAP, an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
