# SHERPA: A Model-Driven Framework for Large Language Model Execution

**Authors:** Boqi Chen, Kua Chen, Jos\'e Antonio Hern\'andez L\'opez, Gunter Mussbacher, D\'aniel Varr\'o, Amir Feizpour

arXiv: 2509.00272 · 2025-09-03

## TL;DR

SHERPA introduces a model-driven framework using hierarchical state machines to explicitly incorporate domain-specific best practices, significantly enhancing large language models' performance on complex tasks across various applications.

## Contribution

The paper presents SHERPA, a novel framework that controls LLM behavior through state machines, improving performance on complex tasks by integrating domain-specific knowledge.

## Key findings

- State machines improve LLM output quality.
- SHERPA enhances performance on code generation, class naming, and question answering.
- Framework is effective across various LLMs.

## Abstract

Recently, large language models (LLMs) have achieved widespread application across various fields. Despite their impressive capabilities, LLMs suffer from a lack of structured reasoning ability, particularly for complex tasks requiring domain-specific best practices, which are often unavailable in the training data. Although multi-step prompting methods incorporating human best practices, such as chain-of-thought and tree-of-thought, have gained popularity, they lack a general mechanism to control LLM behavior. In this paper, we propose SHERPA, a model-driven framework to improve the LLM performance on complex tasks by explicitly incorporating domain-specific best practices into hierarchical state machines. By structuring the LLM execution processes using state machines, SHERPA enables more fine-grained control over their behavior via rules or decisions driven by machine learning-based approaches, including LLMs. We show that SHERPA is applicable to a wide variety of tasks-specifically, code generation, class name generation, and question answering-replicating previously proposed approaches while further improving the performance. We demonstrate the effectiveness of SHERPA for the aforementioned tasks using various LLMs. Our systematic evaluation compares different state machine configurations against baseline approaches without state machines. Results show that integrating well-designed state machines significantly improves the quality of LLM outputs, and is particularly beneficial for complex tasks with well-established human best practices but lacking data used for training LLMs.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00272/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00272/full.md

## References

64 references — full list in the complete paper: https://tomesphere.com/paper/2509.00272/full.md

---
Source: https://tomesphere.com/paper/2509.00272