AsyncBEV: Cross-modal Flow Alignment in Asynchronous 3D Object Detection
Shiming Wang, Holger Caesar, Liangliang Nan, Julian F. P. Kooij

TL;DR
AsyncBEV is a novel module that enhances 3D object detection robustness in autonomous driving by aligning asynchronous multi-modal sensor data using learned scene flow estimation, significantly improving detection of dynamic objects.
Contribution
The paper introduces AsyncBEV, a trainable, lightweight module that aligns asynchronous sensor data in 3D BEV detection, improving robustness against sensor time offsets.
Findings
Improves NDS by 16.6% on dynamic objects with 0.5s offset
Effective for both token-based and grid-based BEV detectors
Significantly outperforms baseline methods in asynchronous scenarios
Abstract
In autonomous driving, multi-modal perception tasks like 3D object detection typically rely on well-synchronized sensors, both at training and inference. However, despite the use of hardware- or software-based synchronization algorithms, perfect synchrony is rarely guaranteed: Sensors may operate at different frequencies, and real-world factors such as network latency, hardware failures, or processing bottlenecks often introduce time offsets between sensors. Such asynchrony degrades perception performance, especially for dynamic objects. To address this challenge, we propose AsyncBEV, a trainable lightweight and generic module to improve the robustness of 3D Birds' Eye View (BEV) object detection models against sensor asynchrony. Inspired by scene flow estimation, AsyncBEV first estimates the 2D flow from the BEV features of two different sensor modalities, taking into account the known…
Peer Reviews
Decision·ICLR 2026 Poster
- Conceptually intuitive and interpretable formulation for feature alignment under temporal misalignment and dynamic motion. - Generalizable design compatible with both grid and token BEV frameworks. - Consistent improvements across offset magnitudes with negligible runtime overhead.
- Lacks discussion of real-world latency handling in AV stacks, where stale data (>100 ms) are typically discarded. Comparison to such baselines (e.g., camera-only inference or temporal propagation) would clarify practical value. - Limited benchmarking against contemporary temporal alignment methods like StreamingFlow. - Performance could be impacted by stochastic latency profiles, but no analysis is provided on sensitivity to time offset estimation error.
**High Practical Relevance** Sensor asynchrony is unavoidable in real-world autonomous driving, yet it is often overlooked in detector design. By targeting this gap, AsyncBEV directly improves the safety and reliability of perception systems—particularly for dynamic objects (e.g., pedestrians, moving vehicles), which are the primary cause of accidents. This makes the work valuable for both academic research and industrial deployment. **Effective Δ-BEVFlow Design** Δ-BEVFlow addresses key lim
**Limited Novelty Compared to Prior Asynchrony-Robust Work** The core idea of using BEV flow for asynchrony compensation is not entirely new. For example, CoBEVFlow (Wei et al., NeurIPS 2023) already uses BEV flow to handle asynchronous collaborative perception, though it relies on object proposals and is more computationally heavy. Additionally, recent work like UniV2X (Yu et al., AAAI 2025) explores end-to-end autonomous driving with V2X cooperation, which also involves addressing multi-agent
- Addressing robustness against sensor asynchrony is important from a practical viewpoint since autonomous vehicles often come equipped with multiple sensors. - The paper is well-written and easy to follow. The description is detailed and the figures (Fig.2,3,5) are informative. - AsyncBEV module is a generic and lightweight module that can be combined with both grid-based and token-based BEV detectors. - The proposed module predicts the delta 2D scene flow in BEV space, conditioned on delta tim
- In Sec.4.2, for the finetuning UniBEV variant, is the delta timestep offset (between reference and asynchronous sensor) also used as input? It'd be useful to have a finetuning baseline that also incorporates delta timestep offset. For example, LiDAR BEV can be augmented with the delta timestep (on a per-point basis) as an additional feature channel (a similar strategy was also used in the Fan et al. 2025 referenced paper). This would help understand if the delta flow formulation is indeed effe
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices
