CollideNet: Hierarchical Multi-scale Video Representation Learning with Disentanglement for Time-To-Collision Forecasting

Nishq Poorav Desai; Ali Etemad; Michael Greenspan

arXiv:2604.16240·cs.CV·April 20, 2026

CollideNet: Hierarchical Multi-scale Video Representation Learning with Disentanglement for Time-To-Collision Forecasting

Nishq Poorav Desai, Ali Etemad, Michael Greenspan

PDF

1 Repo

TL;DR

CollideNet is a hierarchical transformer architecture designed for accurate time-to-collision forecasting in videos, effectively capturing multi-scale spatial and temporal patterns and disentangling non-stationary components.

Contribution

Introduces a novel hierarchical transformer model that captures multi-scale features and disentangles components for improved TTC forecasting accuracy.

Findings

01

Achieves state-of-the-art performance on three public datasets.

02

Demonstrates strong generalization across different datasets.

03

Visualizes disentanglement of trend and seasonality in video data.

Abstract

Time-to-Collision (TTC) forecasting is a critical task in collision prevention, requiring precise temporal prediction and comprehending both local and global patterns encapsulated in a video, both spatially and temporally. To address the multi-scale nature of video, we introduce a novel spatiotemporal hierarchical transformer-based architecture called CollideNet, specifically catered for effective TTC forecasting. In the spatial stream, CollideNet aggregates information for each video frame simultaneously at multiple resolutions. In the temporal stream, along with multi-scale feature encoding, CollideNet also disentangles the non-stationarity, trend, and seasonality components. Our method achieves state-of-the-art performance in comparison to prior works on three commonly used public datasets, setting a new state-of-the-art by a considerable margin. We conduct cross-dataset evaluations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DeSinister/CollideNet
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.