Vision-Language Foundation Models for Comprehensive Automated Pavement Condition Assessment
Blessing Agyei Kyem, Joshua Kofi Asamoah, Anthony Dontoh, Armstrong Aboah

TL;DR
This paper introduces PaveInstruct and PaveGPT, a domain-specific vision-language model for pavement condition assessment, achieving significant performance improvements and ASTM compliance, enabling simplified infrastructure evaluation.
Contribution
It creates a large, unified pavement dataset and trains a foundation model that outperforms existing models in technical assessment tasks.
Findings
Achieved over 20% improvement in spatial grounding, reasoning, and generation tasks.
Produced ASTM D6433-compliant outputs for pavement assessment.
Enabled deployment of unified conversational tools for infrastructure evaluation.
Abstract
General-purpose vision-language models demonstrate strong performance in everyday domains but struggle with specialized technical fields requiring precise terminology, structured reasoning, and adherence to engineering standards. This work addresses whether domain-specific instruction tuning can enable comprehensive pavement condition assessment through vision-language models. PaveInstruct, a dataset containing 278,889 image-instruction-response pairs spanning 32 task types, was created by unifying annotations from nine heterogeneous pavement datasets. PaveGPT, a pavement foundation model trained on this dataset, was evaluated against state-of-the-art vision-language models across perception, understanding, and reasoning tasks. Instruction tuning transformed model capabilities, achieving improvements exceeding 20% in spatial grounding, reasoning, and generation tasks while producing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
