AEC-Bench: A Multimodal Benchmark for Agentic Systems in Architecture, Engineering, and Construction

Harsh Mankodiya; Chase Gallik; Theodoros Galanos; Andriy Mulyar

arXiv:2603.29199·cs.AI·April 1, 2026

AEC-Bench: A Multimodal Benchmark for Agentic Systems in Architecture, Engineering, and Construction

Harsh Mankodiya, Chase Gallik, Theodoros Galanos, Andriy Mulyar

PDF

1 Repo 1 Datasets

TL;DR

AEC-Bench is a comprehensive multimodal benchmark designed to evaluate agentic systems in real-world architecture, engineering, and construction tasks, promoting consistent performance improvements and open research practices.

Contribution

It introduces a new benchmark dataset, evaluation protocol, and baseline results for assessing foundation models in AEC-specific tasks, with openly available code and data.

Findings

01

Baseline models show consistent performance improvements with specific tools and harness design techniques.

02

The benchmark covers tasks like drawing understanding, cross-sheet reasoning, and project coordination.

03

Open release of dataset, code, and agent harness facilitates reproducibility and further research.

Abstract

The AEC-Bench is a multimodal benchmark for evaluating agentic systems on real-world tasks in the Architecture, Engineering, and Construction (AEC) domain. The benchmark covers tasks requiring drawing understanding, cross-sheet reasoning, and construction project-level coordination. This report describes the benchmark motivation, dataset taxonomy, evaluation protocol, and baseline results across several domain-specific foundation model harnesses. We use AEC-Bench to identify consistent tools and harness design techniques that uniformly improve performance across foundation models in their own base harnesses, such as Claude Code and Codex. We openly release our benchmark dataset, agent harness, and evaluation code for full replicability at https://github.com/nomic-ai/aec-bench under an Apache 2 license.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nomic-ai/aec-bench
github

Datasets

nomic-ai/aec-bench
dataset· 158 dl
158 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.