TL;DR
The paper introduces V2V, a novel method that converts regular videos into event-based voxel grids, drastically reducing storage needs and enabling large-scale training for event vision models.
Contribution
V2V provides an efficient, storage-saving approach to generate synthetic event data directly from videos, facilitating large-scale training and improved model robustness.
Findings
Achieved 150x reduction in storage requirements.
Trained models on 10,000 videos, significantly larger than existing datasets.
Demonstrated substantial improvements in event-based vision tasks.
Abstract
Event-based cameras offer unique advantages such as high temporal resolution, high dynamic range, and low power consumption. However, the massive storage requirements and I/O burdens of existing synthetic data generation pipelines and the scarcity of real data prevent event-based training datasets from scaling up, limiting the development and generalization capabilities of event vision models. To address this challenge, we introduce Video-to-Voxel (V2V), an approach that directly converts conventional video frames into event-based voxel grid representations, bypassing the storage-intensive event stream generation entirely. V2V enables a 150 times reduction in storage requirements while supporting on-the-fly parameter randomization for enhanced model robustness. Leveraging this efficiency, we train several video reconstruction and optical flow estimation model architectures on 10,000…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
