Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition
Yaozong Gan, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki, Haseyama

TL;DR
This paper introduces a cross-domain few-shot in-context learning approach using multimodal large language models to improve traffic sign recognition across different countries with minimal data.
Contribution
It proposes a novel method combining traffic sign detection and description generation to enhance MLLM's fine-grained recognition in cross-domain scenarios.
Findings
Significant performance improvement on multiple traffic sign datasets.
Effective reduction of cross-domain differences using description texts.
No need for large-scale traffic sign image datasets.
Abstract
Recent multimodal large language models (MLLM) such as GPT-4o and GPT-4v have shown great potential in autonomous driving. In this paper, we propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition (TSR). We first construct a traffic sign detection network based on Vision Transformer Adapter and an extraction module to extract traffic signs from the original road images. To reduce the dependence on training data and improve the performance stability of cross-country TSR, we introduce a cross-domain few-shot in-context learning method based on the MLLM. To enhance MLLM's fine-grained recognition ability of traffic signs, the proposed method generates corresponding description texts using template traffic signs. These description texts contain key information about the shape, color, and composition of traffic signs, which can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Infrastructure Maintenance and Monitoring · Vehicle License Plate Recognition
MethodsAttention Is All You Need · Softmax · Byte Pair Encoding · Adapter · Layer Normalization · Linear Layer · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam
