CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario
Zhizhao Duan, Hao Cheng, Duo Xu, Xi Wu, Xiangxie Zhang, Xi Ye, Zhen, Xie

TL;DR
CityLLaVA introduces an efficient fine-tuning framework for visual language models tailored to urban scenarios, significantly improving traffic safety analysis accuracy and achieving top benchmark performance.
Contribution
The paper presents a novel fine-tuning approach for VLMs in city environments, utilizing bounding boxes, prompt engineering, block expansion, and sequential questioning for enhanced urban traffic analysis.
Findings
Achieved a benchmark score of 33.4308, leading on the leaderboard.
Demonstrated improved comprehension and prediction accuracy in urban traffic scenarios.
Proved the effectiveness of sequential questioning in prediction augmentation.
Abstract
In the vast and dynamic landscape of urban settings, Traffic Safety Description and Analysis plays a pivotal role in applications ranging from insurance inspection to accident prevention. This paper introduces CityLLaVA, a novel fine-tuning framework for Visual Language Models (VLMs) designed for urban scenarios. CityLLaVA enhances model comprehension and prediction accuracy through (1) employing bounding boxes for optimal visual data preprocessing, including video best-view selection and visual prompt engineering during both training and testing phases; (2) constructing concise Question-Answer sequences and designing textual prompts to refine instruction comprehension; (3) implementing block expansion to fine-tune large VLMs efficiently; and (4) advancing prediction accuracy via a unique sequential questioning-based prediction augmentation. Demonstrating top-tier performance, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptical Wireless Communication Technologies · Semiconductor Lasers and Optical Devices · Advanced Optical Network Technologies
