Multimodal Modular Chain of Thoughts in Energy Performance Certificate Assessment
Zhen Peng, Peter J. Bentley

TL;DR
This paper introduces a multimodal modular reasoning framework using Vision-Language models to improve automated energy performance assessments of buildings with limited visual data, demonstrating significant accuracy gains.
Contribution
It proposes the MMCoT architecture that decomposes EPC estimation into reasoning stages with structured prompts, advancing low-cost, data-efficient building energy evaluation methods.
Findings
MMCoT outperforms instruction-only prompting in EPC estimation
Captures the ordinal structure of EPC ratings effectively
Most errors occur between adjacent EPC classes
Abstract
Accurate evaluation of building energy performance remains challenging in regions where scalable Energy Performance Certificate (EPC) assessments are unavailable. This paper presents a cost-efficient framework that leverages Vision-Language models for automated EPC pre-assessment from limited visual information. The proposed Multimodal Modular Chain of Thoughts (MMCoT) architecture decomposes EPC estimation into intermediate reasoning stages and explicitly propagates inferred attributes across tasks using structured prompting. Experiments on a multimodal dataset of 81 residential properties in the United Kingdom show that MMCoT achieves statistically significant improvements over instruction-only prompting for EPC estimation. Analysis based on accuracy, recall, mean absolute error, and confusion matrices indicate that the proposed approach captures the ordinal structure of EPC ratings,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBuilding Energy and Comfort Optimization · Smart Grid Energy Management · BIM and Construction Integration
