Pedestrian Crossing Intention Prediction Using Multimodal Fusion Network

Yuanzhe Li; Steffen M\"uller

arXiv:2511.20008·cs.CV·March 25, 2026

Pedestrian Crossing Intention Prediction Using Multimodal Fusion Network

Yuanzhe Li, Steffen M\"uller

PDF

Open Access

TL;DR

This paper introduces a multimodal fusion network utilizing visual and motion data with Transformer modules and attention mechanisms to accurately predict pedestrian crossing intentions, enhancing autonomous vehicle safety.

Contribution

The paper presents a novel multimodal fusion network with depth-guided, modality, and temporal attention modules for improved pedestrian intention prediction.

Findings

01

Achieves superior performance on JAAD dataset

02

Effectively integrates multiple modalities for prediction

03

Outperforms baseline methods in accuracy

Abstract

Pedestrian crossing intention prediction is essential for the deployment of autonomous vehicles (AVs) in urban environments. Ideal prediction provides AVs with critical environmental cues, thereby reducing the risk of pedestrian-related collisions. However, the prediction task is challenging due to the diverse nature of pedestrian behavior and its dependence on multiple contextual factors. This paper proposes a multimodal fusion network that leverages seven modality features from both visual and motion branches, aiming to effectively extract and integrate complementary cues across different modalities. Specifically, motion and visual features are extracted from the raw inputs using multiple Transformer-based extraction modules. Depth-guided attention module leverages depth information to guide attention towards salient regions in another modality through comprehensive spatial feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Advanced Neural Network Applications · Multimodal Machine Learning Applications