Generating Expressive and Customizable Evals for Timeseries Data Analysis Agents with AgentFuel
Aadyaa Maddi, Prakhar Naval, Deepti Mande, Shane Duan, Muckai Girish, Vyas Sekar

TL;DR
This paper introduces AgentFuel, a tool for creating domain-specific, expressive evaluation datasets for timeseries data analysis agents, addressing current limitations in agent evaluation and guiding improvements.
Contribution
AgentFuel enables domain experts to quickly generate customized evals for timeseries data agents, filling expressivity gaps in existing evaluation methods.
Findings
Existing agents fail on stateful and incident-specific queries.
AgentFuel's benchmarks reveal key areas for framework improvement.
Using AgentFuel can enhance agent performance, as shown with GEPA.
Abstract
Across many domains (e.g., IoT, observability, telecommunications, cybersecurity), there is an emerging adoption of conversational data analysis agents that enable users to "talk to your data" to extract insights. Such data analysis agents operate on timeseries data models; e.g., measurements from sensors or events monitoring user clicks and actions in product analytics. We evaluate 6 popular data analysis agents (both open-source and proprietary) on domain-specific data and query types, and find that they fail on stateful and incident-specific queries. We observe two key expressivity gaps in existing evals: domain-customized datasets and domain-specific query types. To enable practitioners in such domains to generate customized and expressive evals for such timeseries data agents, we present AgentFuel. AgentFuel helps domain experts quickly create customized evals to perform end-to-end…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Mobile Crowdsensing and Crowdsourcing · Topic Modeling
