Quantized Vision-Language Models for Damage Assessment: A Comparative Study of LLaVA-1.5-7B Quantization Levels
Takato Yasuno

TL;DR
This study evaluates quantized vision-language models for automated bridge damage assessment, balancing description quality, inference speed, and resource use, with Q5_K_M providing the best overall performance.
Contribution
It systematically compares three quantization levels of VLMs for damage detection, introducing a new evaluation framework and demonstrating optimal trade-offs for deployment.
Findings
Q5_K_M achieves the best quality-speed trade-off.
Q5_K_M has 8.5% higher quality than Q4_K_M.
Q5_K_M's performance is consistent regardless of description length.
Abstract
Bridge infrastructure inspection is a critical but labor-intensive task requiring expert assessment of structural damage such as rebar exposure, cracking, and corrosion. This paper presents a comprehensive study of quantized Vision-Language Models (VLMs) for automated bridge damage assessment, focusing on the trade-offs between description quality, inference speed, and resource requirements. We develop an end-to-end pipeline combining LLaVA-1.5-7B for visual damage analysis, structured JSON extraction, and rule-based priority scoring. To enable deployment on consumer-grade GPUs, we conduct a systematic comparison of three quantization levels: Q4_K_M, Q5_K_M, and Q8\_0 across 254 rebar exposure images. We introduce a 5-point quality evaluation framework assessing damage type recognition, severity classification. Our results demonstrate that Q5_K_M achieves the optimal balance: quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
