SynAE: A Framework for Measuring the Quality of Synthetic Data for Tool-Calling Agent Evaluations

Shuaiqi Wang; Aadyaa Maddi; Zinan Lin; Giulia Fanti

arXiv:2605.22564·cs.CL·May 22, 2026

SynAE: A Framework for Measuring the Quality of Synthetic Data for Tool-Calling Agent Evaluations

Shuaiqi Wang, Aadyaa Maddi, Zinan Lin, Giulia Fanti

PDF

1 Repo

TL;DR

SynAE is a comprehensive framework designed to evaluate the quality of synthetic datasets used for testing tool-calling agents, ensuring they accurately replicate real data characteristics across multiple metrics.

Contribution

This work introduces SynAE, a novel multi-metric evaluation framework for synthetic data quality in tool-calling agent assessments, addressing limitations of single-metric approaches.

Findings

01

SynAE effectively detects variations in data validity, fidelity, and diversity.

02

No single metric suffices to fully characterize synthetic data quality.

03

Multi-axis evaluation provides a more comprehensive assessment of synthetic data.

Abstract

Today, tool-calling agents are commonly evaluated or tested on static datasets of execution traces, including input commands, agent responses, and associated tool calls. However, internal production datasets are often insufficient or unusable for testing; for example, they may contain sensitive or proprietary data, or they may be too sparse to support comprehensive testing (especially pre-deployment). In these settings, practitioners are increasingly replacing or augmenting real datasets with synthetic ones for evaluation purposes. A key challenge is quantifying the relation between these synthetic datasets and the real data. We introduce SynAE, an evaluation framework for assessing how well synthetic benchmarks for multi-turn, tool-calling agents replicate and augment the characteristics of real data trajectories. SynAE assesses the validity, fidelity, and diversity of synthetic data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wsqwsq/SynAE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.