LLARS: Enabling Domain Expert & Developer Collaboration for LLM Prompting, Generation and Evaluation
Philipp Steigerwald, Mara Stieler, Jennifer Burghardt, Eric Rudolph, Jens Albrecht

TL;DR
LLARS is an open-source platform that facilitates collaboration between domain experts and developers in building, testing, and evaluating LLM-based systems through integrated modules and real-time tools.
Contribution
It introduces an end-to-end system with collaborative prompt engineering, batch generation, and hybrid evaluation, enhancing interdisciplinary collaboration and efficiency in LLM development.
Findings
Interviews confirmed LLARS is intuitive and saves time.
The platform enables seamless collaboration across disciplines.
Live evaluation metrics help identify optimal model-prompt pairs.
Abstract
We demonstrate LLARS (LLM Assisted Research System), an open-source platform that bridges the gap between domain experts and developers for building LLM-based systems. It integrates three tightly connected modules into an end-to-end pipeline: Collaborative Prompt Engineering for real-time co-authoring with version control and instant LLM testing, Batch Generation for configurable output production across user-selected prompts models data with cost control, and Hybrid Evaluation where human and LLM evaluators jointly assess outputs through diverse assessment methods, with live agreement metrics and provenance analysis to identify the best model-prompt combination for a given use case. New prompts and models are automatically available for batch generation and completed batches can be turned into evaluation scenarios with a single click. Interviews with six domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
