EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design

Gioele Molinari; Florian Felten; Soheyl Massoudi; Mark Fuge

arXiv:2605.19743·cs.AI·May 20, 2026

EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design

Gioele Molinari, Florian Felten, Soheyl Massoudi, Mark Fuge

PDF

TL;DR

This paper introduces EngiAI, a comprehensive multi-agent benchmark suite for evaluating LLM-driven engineering design tasks across workflows, retrieval, and HPC, along with a reference multi-agent system implementation.

Contribution

It presents a novel benchmark suite with diverse evaluation dimensions and a multi-agent system framework for engineering design, addressing gaps in existing evaluation methods.

Findings

01

Proprietary models achieve 96-97% task completion on Beams2D.

02

Open-source 4B models reach 55-78% completion, showing improvement.

03

Retrieval-augmented scores are near perfect with gating, validating the evaluation design.

Abstract

Large Language Model (LLM) agents are increasingly applied to engineering design tasks, yet existing evaluation frameworks do not adequately address multi-agent systems that combine simulation, retrieval, and manufacturing preparation. We introduce a benchmark suite with three evaluation dimensions: (1) a workflow benchmark with seven prompt styles targeting distinct cognitive demands-including direct tool use, semantic disambiguation, conditional branching, and working-memory tasks; (2) a Retrieval-Augmented Generation (RAG) benchmark with gated scoring isolating retrieval contributions to parameter selection; and (3) an High Performance Computing (HPC) benchmark evaluating end-to-end ML training orchestration on a SLURM cluster. Alongside the benchmark we present EngiAI, a Multi-Agent System (MAS) reference implementation built on LangGraph that operationalizes the benchmark by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.