Object Detection using Oriented Window Learning Vi-sion Transformer:   Roadway Assets Recognition

Taqwa Alhadidi; Ahmed Jaber; Shadi Jaradat; Huthaifa I Ashqar; and; Mohammed Elhenawy

arXiv:2406.10712·cs.CV·June 18, 2024·1 cites

Object Detection using Oriented Window Learning Vi-sion Transformer: Roadway Assets Recognition

Taqwa Alhadidi, Ahmed Jaber, Shadi Jaradat, Huthaifa I Ashqar, and, Mohammed Elhenawy

PDF

Open Access

TL;DR

This paper introduces OWL-ViT, a novel vision transformer model with oriented window learning, tailored for roadway asset detection in transportation systems, demonstrating high efficiency and robustness across diverse scenarios.

Contribution

It presents a new method integrating OWL-ViT within a one-shot learning framework for effective roadway asset recognition, addressing challenges of data variability and object diversity.

Findings

01

High detection accuracy across multiple roadway assets

02

Robust performance under varying resolutions and contexts

03

Enhanced detection consistency and reliability

Abstract

Object detection is a critical component of transportation systems, particularly for applications such as autonomous driving, traffic monitoring, and infrastructure maintenance. Traditional object detection methods often struggle with limited data and variability in object appearance. The Oriented Window Learning Vision Transformer (OWL-ViT) offers a novel approach by adapting window orientations to the geometry and existence of objects, making it highly suitable for detecting diverse roadway assets. This study leverages OWL-ViT within a one-shot learning framework to recognize transportation infrastructure components, such as traffic signs, poles, pavement, and cracks. This study presents a novel method for roadway asset detection using OWL-ViT. We conducted a series of experiments to evaluate the performance of the model in terms of detection consistency, semantic flexibility, visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Industrial Vision Systems and Defect Detection · Vehicle License Plate Recognition

MethodsResidual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer