YRC-Bench: A Benchmark for Learning to Coordinate with Experts

Mohamad H. Danesh; Nguyen X. Khanh; Tu Trinh; Benjamin Plaut

arXiv:2502.09583·cs.LG·January 14, 2026

YRC-Bench: A Benchmark for Learning to Coordinate with Experts

Mohamad H. Danesh, Nguyen X. Khanh, Tu Trinh, Benjamin Plaut

PDF

Open Access 1 Repo

TL;DR

YRC-Bench introduces a benchmark for training AI agents to recognize when to seek expert help in new environments without prior expert interaction, promoting safer and more reliable autonomous decision-making.

Contribution

The paper presents YRC-Bench, an open-source benchmark for the novel YRC-0 problem, enabling research on unsupervised learning to coordinate with experts in diverse environments.

Findings

01

Proposed a validation strategy for YRC-0

02

Developed a proposer-validator diagnostic framework

03

Provided baseline implementations and evaluation pipeline

Abstract

When deployed in the real world, AI agents will inevitably face challenges that exceed their individual capabilities. A critical component of AI safety is an agent's ability to recognize when it is likely to fail in a novel situation and to yield control to a more capable expert system. Leveraging such expert assistance can significantly improve safety and performance in such situations. Since expert assistance is costly, a central challenge is determining when to consult an expert. In this paper, we explore a novel variant of this problem, termed YRC-0, in which an agent must learn to collaborate with an expert in new environments in an unsupervised manner--that is, without interacting with the expert during training. This setting motivates the development of low-cost, robust approaches for training expert-leveraging agents. To support research in this area, we introduce YRC-Bench, an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

modanesh/yrc-bench
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Systems and Decision Making · Biomedical and Engineering Education · Big Data and Business Intelligence