Can Large Multimodal Models Inspect Buildings? A Hierarchical Benchmark for Structural Pathology Reasoning
Hui Zhong, Yichun Gao, Luyan Liu, Hai Yang, Wang Wang, Haowei Zhang, Xinhu Zheng

TL;DR
This paper introduces a hierarchical benchmark called DefectBench to evaluate large multimodal models in building inspection tasks, highlighting their strengths in semantic understanding and topological reasoning but also their localization limitations.
Contribution
It presents a standardized, multi-dimensional benchmark and dataset for assessing LMMs in structural pathology reasoning, advancing evaluation standards in civil engineering AI applications.
Findings
LMMs excel in semantic perception and topological awareness.
LMMs show significant gaps in metric localization accuracy.
Zero-shot generative segmentation can match supervised models without domain training.
Abstract
Automated building facade inspection is a critical component of urban resilience and smart city maintenance. Traditionally, this field has relied on specialized discriminative models (e.g., YOLO, Mask R-CNN) that excel at pixel-level localization but are constrained to passive perception and worse generization without the visual understandng to interpret structural topology. Large Multimodal Models (LMMs) promise a paradigm shift toward active reasoning, yet their application in such high-stakes engineering domains lacks rigorous evaluation standards. To bridge this gap, we introduce a human-in-the-loop semi-automated annotation framework, leveraging expert-proposal verification to unify 12 fragmented datasets into a standardized, hierarchical ontology. Building on this foundation, we present \textit{DefectBench}, the first multi-dimensional benchmark designed to interrogate LMMs beyond…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · 3D Surveying and Cultural Heritage · Infrastructure Maintenance and Monitoring
