Specification and Evaluation of Multi-Agent LLM Systems -- Prototype and Cybersecurity Applications
Felix H\"arer

TL;DR
This paper introduces a multi-agent LLM system specification language and prototype, demonstrating its application to cybersecurity tasks and enabling systematic evaluation of LLMs and reasoning techniques in complex, domain-specific scenarios.
Contribution
It presents a novel agent schema language, system architecture, and prototype for multi-agent LLM systems, facilitating their specification, execution, and evaluation in cybersecurity applications.
Findings
Feasibility of multi-agent architecture demonstrated with cybersecurity tasks
Agents successfully completed question answering, server, and network security tasks
Evaluation approach enables systematic assessment of LLMs and reasoning techniques
Abstract
Recent advancements in LLMs indicate potential for novel applications, as evidenced by the reasoning capabilities in the latest OpenAI and DeepSeek models. To apply these models to domain-specific applications beyond text generation, LLM-based multi-agent systems can be utilized to solve complex tasks, particularly by combining reasoning techniques, code generation, and software execution across multiple, potentially specialized LLMs. However, while many evaluations are performed on LLMs, reasoning techniques, and applications individually, their joint specification and combined application are not well understood. Defined specifications for multi-agent LLM systems are required to explore their potential and suitability for specific applications, allowing for systematic evaluations of LLMs, reasoning techniques, and related aspects. This paper reports the results of exploratory research…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Mobile Agent-Based Network Management · Access Control and Trust
