Demystifying the Potential of ChatGPT-4 Vision for Construction Progress Monitoring
Ahmet Bahaddin Ersoz

TL;DR
This paper investigates GPT-4 Vision's ability to analyze aerial images for construction progress monitoring, highlighting its strengths in identifying construction elements and discussing future improvements for industry application.
Contribution
It demonstrates GPT-4 Vision's capabilities in construction scene analysis and progress tracking, and explores future enhancements for better industry integration.
Findings
Proficient in identifying construction stages, materials, and machinery.
Challenges with precise object localization and segmentation.
Potential for future advancements with domain-specific training.
Abstract
The integration of Large Vision-Language Models (LVLMs) such as OpenAI's GPT-4 Vision into various sectors has marked a significant evolution in the field of artificial intelligence, particularly in the analysis and interpretation of visual data. This paper explores the practical application of GPT-4 Vision in the construction industry, focusing on its capabilities in monitoring and tracking the progress of construction projects. Utilizing high-resolution aerial imagery of construction sites, the study examines how GPT-4 Vision performs detailed scene analysis and tracks developmental changes over time. The findings demonstrate that while GPT-4 Vision is proficient in identifying construction stages, materials, and machinery, it faces challenges with precise object localization and segmentation. Despite these limitations, the potential for future advancements in this technology is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · COVID-19 diagnosis using AI
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Residual Connection · Adam · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Dropout
