Vision-Language Foundation Models for Comprehensive Automated Pavement Condition Assessment

Blessing Agyei Kyem; Joshua Kofi Asamoah; Anthony Dontoh; Armstrong Aboah

arXiv:2604.08212·cs.CV·April 10, 2026

Vision-Language Foundation Models for Comprehensive Automated Pavement Condition Assessment

Blessing Agyei Kyem, Joshua Kofi Asamoah, Anthony Dontoh, Armstrong Aboah

PDF

TL;DR

This paper introduces PaveInstruct and PaveGPT, a domain-specific vision-language model for pavement condition assessment, achieving significant performance improvements and ASTM compliance, enabling simplified infrastructure evaluation.

Contribution

It creates a large, unified pavement dataset and trains a foundation model that outperforms existing models in technical assessment tasks.

Findings

01

Achieved over 20% improvement in spatial grounding, reasoning, and generation tasks.

02

Produced ASTM D6433-compliant outputs for pavement assessment.

03

Enabled deployment of unified conversational tools for infrastructure evaluation.

Abstract

General-purpose vision-language models demonstrate strong performance in everyday domains but struggle with specialized technical fields requiring precise terminology, structured reasoning, and adherence to engineering standards. This work addresses whether domain-specific instruction tuning can enable comprehensive pavement condition assessment through vision-language models. PaveInstruct, a dataset containing 278,889 image-instruction-response pairs spanning 32 task types, was created by unifying annotations from nine heterogeneous pavement datasets. PaveGPT, a pavement foundation model trained on this dataset, was evaluated against state-of-the-art vision-language models across perception, understanding, and reasoning tasks. Instruction tuning transformed model capabilities, achieving improvements exceeding 20% in spatial grounding, reasoning, and generation tasks while producing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.