Specification and Evaluation of Multi-Agent LLM Systems -- Prototype and Cybersecurity Applications

Felix H\"arer

arXiv:2506.10467·cs.CR·July 22, 2025

Specification and Evaluation of Multi-Agent LLM Systems -- Prototype and Cybersecurity Applications

Felix H\"arer

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-agent LLM system specification language and prototype, demonstrating its application to cybersecurity tasks and enabling systematic evaluation of LLMs and reasoning techniques in complex, domain-specific scenarios.

Contribution

It presents a novel agent schema language, system architecture, and prototype for multi-agent LLM systems, facilitating their specification, execution, and evaluation in cybersecurity applications.

Findings

01

Feasibility of multi-agent architecture demonstrated with cybersecurity tasks

02

Agents successfully completed question answering, server, and network security tasks

03

Evaluation approach enables systematic assessment of LLMs and reasoning techniques

Abstract

Recent advancements in LLMs indicate potential for novel applications, as evidenced by the reasoning capabilities in the latest OpenAI and DeepSeek models. To apply these models to domain-specific applications beyond text generation, LLM-based multi-agent systems can be utilized to solve complex tasks, particularly by combining reasoning techniques, code generation, and software execution across multiple, potentially specialized LLMs. However, while many evaluations are performed on LLMs, reasoning techniques, and applications individually, their joint specification and combined application are not well understood. Defined specifications for multi-agent LLM systems are required to explore their potential and suitability for specific applications, allowing for systematic evaluations of LLMs, reasoning techniques, and related aspects. This paper reports the results of exploratory research…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fhaer/multi-agent-llm-system
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Mobile Agent-Based Network Management · Access Control and Trust