Quantized Vision-Language Models for Damage Assessment: A Comparative Study of LLaVA-1.5-7B Quantization Levels

Takato Yasuno

arXiv:2603.26770·cs.CV·March 31, 2026

Quantized Vision-Language Models for Damage Assessment: A Comparative Study of LLaVA-1.5-7B Quantization Levels

Takato Yasuno

PDF

TL;DR

This study evaluates quantized vision-language models for automated bridge damage assessment, balancing description quality, inference speed, and resource use, with Q5_K_M providing the best overall performance.

Contribution

It systematically compares three quantization levels of VLMs for damage detection, introducing a new evaluation framework and demonstrating optimal trade-offs for deployment.

Findings

01

Q5_K_M achieves the best quality-speed trade-off.

02

Q5_K_M has 8.5% higher quality than Q4_K_M.

03

Q5_K_M's performance is consistent regardless of description length.

Abstract

Bridge infrastructure inspection is a critical but labor-intensive task requiring expert assessment of structural damage such as rebar exposure, cracking, and corrosion. This paper presents a comprehensive study of quantized Vision-Language Models (VLMs) for automated bridge damage assessment, focusing on the trade-offs between description quality, inference speed, and resource requirements. We develop an end-to-end pipeline combining LLaVA-1.5-7B for visual damage analysis, structured JSON extraction, and rule-based priority scoring. To enable deployment on consumer-grade GPUs, we conduct a systematic comparison of three quantization levels: Q4_K_M, Q5_K_M, and Q8\_0 across 254 rebar exposure images. We introduce a 5-point quality evaluation framework assessing damage type recognition, severity classification. Our results demonstrate that Q5_K_M achieves the optimal balance: quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.