Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign   Recognition

Yaozong Gan; Guang Li; Ren Togo; Keisuke Maeda; Takahiro Ogawa; Miki; Haseyama

arXiv:2407.05814·cs.CV·July 9, 2024

Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition

Yaozong Gan, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki, Haseyama

PDF

Open Access

TL;DR

This paper introduces a cross-domain few-shot in-context learning approach using multimodal large language models to improve traffic sign recognition across different countries with minimal data.

Contribution

It proposes a novel method combining traffic sign detection and description generation to enhance MLLM's fine-grained recognition in cross-domain scenarios.

Findings

01

Significant performance improvement on multiple traffic sign datasets.

02

Effective reduction of cross-domain differences using description texts.

03

No need for large-scale traffic sign image datasets.

Abstract

Recent multimodal large language models (MLLM) such as GPT-4o and GPT-4v have shown great potential in autonomous driving. In this paper, we propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition (TSR). We first construct a traffic sign detection network based on Vision Transformer Adapter and an extraction module to extract traffic signs from the original road images. To reduce the dependence on training data and improve the performance stability of cross-country TSR, we introduce a cross-domain few-shot in-context learning method based on the MLLM. To enhance MLLM's fine-grained recognition ability of traffic signs, the proposed method generates corresponding description texts using template traffic signs. These description texts contain key information about the shape, color, and composition of traffic signs, which can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Infrastructure Maintenance and Monitoring · Vehicle License Plate Recognition

MethodsAttention Is All You Need · Softmax · Byte Pair Encoding · Adapter · Layer Normalization · Linear Layer · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam