Evaluation and Comparison of Visual Language Models for Transportation   Engineering Problems

Sanjita Prajapati; Tanu Singh; Chinmay Hegde; Pranamesh Chakraborty

arXiv:2409.02278·cs.CV·September 5, 2024·2 cites

Evaluation and Comparison of Visual Language Models for Transportation Engineering Problems

Sanjita Prajapati, Tanu Singh, Chinmay Hegde, Pranamesh Chakraborty

PDF

Open Access 1 Repo

TL;DR

This paper evaluates state-of-the-art vision language models for transportation engineering tasks like congestion detection, crack identification, and helmet violation detection, using zero-shot prompting to assess their performance without task-specific training.

Contribution

It provides a comprehensive comparison of VLM models for transportation tasks, highlighting their strengths and limitations for future development.

Findings

01

VLM models perform comparably to CNNs in image classification.

02

Object localization with VLMs still requires improvement.

03

Zero-shot prompting enables task execution without annotated datasets.

Abstract

Recent developments in vision language models (VLM) have shown great potential for diverse applications related to image understanding. In this study, we have explored state-of-the-art VLM models for vision-based transportation engineering tasks such as image classification and object detection. The image classification task involves congestion detection and crack identification, whereas, for object detection, helmet violations were identified. We have applied open-source models such as CLIP, BLIP, OWL-ViT, Llava-Next, and closed-source GPT-4o to evaluate the performance of these state-of-the-art VLM models to harness the capabilities of language understanding for vision-based transportation tasks. These tasks were performed by applying zero-shot prompting to the VLM models, as zero-shot prompting involves performing tasks without any training on those tasks. It eliminates the need for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

unveilx/slm-od-ml-comparison
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBIM and Construction Integration · Safety Warnings and Signage

MethodsBLIP: Bootstrapping Language-Image Pre-training · Contrastive Language-Image Pre-training