YIELD: A Large-Scale Dataset and Evaluation Framework for Information Elicitation Agents

Victor De Lima; Grace Hui Yang

arXiv:2604.10968·cs.CL·April 14, 2026

YIELD: A Large-Scale Dataset and Evaluation Framework for Information Elicitation Agents

Victor De Lima, Grace Hui Yang

PDF

1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces YIELD, a large-scale dataset and evaluation framework for developing and assessing information elicitation agents in institutional decision-making contexts, supported by experiments with foundation LLMs.

Contribution

The paper presents YIELD, a 26-million-token dataset and formalizes information elicitation as a POMDP, enabling systematic research and improved alignment of language models.

Findings

01

Training on YIELD enhances model alignment with elicitation behavior.

02

Human evaluation supports the effectiveness of models trained on YIELD.

03

The dataset and tools are publicly available for research use.

Abstract

Most conversational agents (CAs) are designed to satisfy user needs through user-driven interactions. However, many real-world settings, such as academic interviewing, judicial proceedings, and journalistic investigations, involve broader institutional decision-making processes and require agents that can elicit information from users. In this paper, we introduce Information Elicitation Agents (IEAs) in which the agent's goal is to elicit information from users to support the agent's institutional or task-oriented objectives. To enable systematic research on this setting, we present YIELD, a 26M-token dataset of 2,281 ethically sourced, human-to-human dialogues. Moreover, we formalize information elicitation as a finite-horizon POMDP and propose novel metrics tailored to IEAs. Pilot experiments on multiple foundation LLMs show that training on YIELD improves their alignment with real…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

infosenselab/yield
github

Models

🤗
infosense/yield-adapters
model

Datasets

infosense/yield
dataset· 89 dl
89 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.