ALERT Open Dataset and Input-Size-Agnostic Vision Transformer for Driver Activity Recognition using IR-UWB

Jeongjun Park; Sunwook Hwang; Hyeonho Noh; Jin Mo Yang; Hyun Jong Yang; and Saewoong Bahk

arXiv:2512.12206·cs.CV·February 17, 2026

ALERT Open Dataset and Input-Size-Agnostic Vision Transformer for Driver Activity Recognition using IR-UWB

Jeongjun Park, Sunwook Hwang, Hyeonho Noh, Jin Mo Yang, Hyun Jong Yang, and Saewoong Bahk

PDF

TL;DR

This paper introduces the ALERT dataset and a novel input-size-agnostic Vision Transformer framework for driver activity recognition using IR-UWB radar, addressing data scarcity and adaptability challenges in real-world distracted driving detection.

Contribution

The work presents the ALERT dataset of real-world UWB radar samples and proposes ISA-ViT, a flexible Vision Transformer that adapts to UWB data dimensions while preserving radar-specific features.

Findings

01

ISA-ViT achieves 22.68% higher accuracy than previous ViT-based methods.

02

The ALERT dataset contains 10,220 samples of seven distracted driving activities.

03

The proposed approach enhances robustness and scalability of driver activity recognition systems.

Abstract

Distracted driving contributes to fatal crashes worldwide. To address this, researchers are using driver activity recognition (DAR) with impulse radio ultra-wideband (IR-UWB) radar, which offers advantages such as interference resistance, low power consumption, and privacy preservation. However, two challenges limit its adoption: the lack of large-scale real-world UWB datasets covering diverse distracted driving behaviors, and the difficulty of adapting fixed-input Vision Transformers (ViTs) to UWB radar data with non-standard dimensions. This work addresses both challenges. We present the ALERT dataset, which contains 10,220 radar samples of seven distracted driving activities collected in real driving conditions. We also propose the input-size-agnostic Vision Transformer (ISA-ViT), a framework designed for radar-based DAR. The proposed method resizes UWB data to meet ViT input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.