ChainBuddy: An AI Agent System for Generating LLM Pipelines
Jingyue Zhang, Ian Arawjo

TL;DR
ChainBuddy is an AI assistant integrated into ChainForge that helps users generate and evaluate LLM pipelines from a single prompt, making the process more accessible and less demanding.
Contribution
This paper introduces ChainBuddy, a novel AI workflow generation assistant that simplifies creating LLM evaluation pipelines and demonstrates its effectiveness through user studies.
Findings
Participants reported lower workload and higher confidence with AI assistance.
Participants produced higher quality pipelines with AI assistance.
Experts rated workflows better when AI was used, revealing a subjective-objective performance mismatch.
Abstract
As large language models (LLMs) advance, their potential applications have grown significantly. However, it remains difficult to evaluate LLM behavior on user-defined tasks and craft effective pipelines to do so. Many users struggle with where to start, often referred to as the "blank page problem." ChainBuddy, an AI workflow generation assistant built into the ChainForge platform, aims to tackle this issue. From a single prompt or chat, ChainBuddy generates a starter evaluative LLM pipeline in ChainForge aligned to the user's requirements. ChainBuddy offers a straightforward and user-friendly way to plan and evaluate LLM behavior and make the process less daunting and more accessible across a wide range of possible tasks and use cases. We report a within-subjects user study comparing ChainBuddy to the baseline interface. We find that when using AI assistance, participants reported a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
