LLMs in social services: How does chatbot accuracy affect human accuracy?

Jennah Gosciak; Eric Giannella; Zhaowen Guo; Michael Chen; Allison Koenecke

arXiv:2603.11213·cs.HC·March 13, 2026

LLMs in social services: How does chatbot accuracy affect human accuracy?

Jennah Gosciak, Eric Giannella, Zhaowen Guo, Michael Chen, Allison Koenecke

PDF

Open Access

TL;DR

This study evaluates how the accuracy of LLM-based chatbots influences caseworkers' ability to provide correct guidance in social services, revealing that higher chatbot accuracy significantly improves human performance but also introduces risks when suggestions are incorrect.

Contribution

The paper introduces a benchmark dataset of complex social service questions and experimentally demonstrates how chatbot accuracy levels impact caseworker performance in a real-world setting.

Findings

01

High-quality chatbots (96-100% accuracy) improve caseworker accuracy by 27 percentage points.

02

Incorrect chatbot suggestions can reduce caseworker accuracy by two-thirds on easy questions.

03

Caseworker performance gains plateau at high chatbot accuracy levels, indicating a limit to human reliance.

Abstract

Social service programs like the Supplemental Nutrition Assistance Program (SNAP, or food stamps) have eligibility rules that can be challenging to understand. For nonprofit caseworkers who often support clients in navigating a dozen or more complex programs, LLM-based chatbots may offer a means to provide better, faster help to clients whose situations may be less common. In this paper, we measure the potential effects of LLM-based chatbot suggestions on caseworkers' ability to provide accurate guidance. We first created a 770-question multiple-choice benchmark dataset of difficult, but realistic questions that a caseworker might receive. Next, using these benchmark questions and corresponding expert-verified answers, we conducted a randomized experiment with caseworkers recruited from nonprofit outreach organizations in Los Angeles. Caseworkers in the control condition did not see…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in Service Interactions · Digital Mental Health Interventions · Spreadsheets and End-User Computing