Can Large Multimodal Models Inspect Buildings? A Hierarchical Benchmark for Structural Pathology Reasoning

Hui Zhong; Yichun Gao; Luyan Liu; Hai Yang; Wang Wang; Haowei Zhang; Xinhu Zheng

arXiv:2603.20148·cs.CV·March 23, 2026

Can Large Multimodal Models Inspect Buildings? A Hierarchical Benchmark for Structural Pathology Reasoning

Hui Zhong, Yichun Gao, Luyan Liu, Hai Yang, Wang Wang, Haowei Zhang, Xinhu Zheng

PDF

Open Access

TL;DR

This paper introduces a hierarchical benchmark called DefectBench to evaluate large multimodal models in building inspection tasks, highlighting their strengths in semantic understanding and topological reasoning but also their localization limitations.

Contribution

It presents a standardized, multi-dimensional benchmark and dataset for assessing LMMs in structural pathology reasoning, advancing evaluation standards in civil engineering AI applications.

Findings

01

LMMs excel in semantic perception and topological awareness.

02

LMMs show significant gaps in metric localization accuracy.

03

Zero-shot generative segmentation can match supervised models without domain training.

Abstract

Automated building facade inspection is a critical component of urban resilience and smart city maintenance. Traditionally, this field has relied on specialized discriminative models (e.g., YOLO, Mask R-CNN) that excel at pixel-level localization but are constrained to passive perception and worse generization without the visual understandng to interpret structural topology. Large Multimodal Models (LMMs) promise a paradigm shift toward active reasoning, yet their application in such high-stakes engineering domains lacks rigorous evaluation standards. To bridge this gap, we introduce a human-in-the-loop semi-automated annotation framework, leveraging expert-proposal verification to unify 12 fragmented datasets into a standardized, hierarchical ontology. Building on this foundation, we present \textit{DefectBench}, the first multi-dimensional benchmark designed to interrogate LMMs beyond…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · 3D Surveying and Cultural Heritage · Infrastructure Maintenance and Monitoring