TL;DR
This paper introduces a zero-shot event matching model for event cameras that achieves wide-baseline correspondence without target-domain fine-tuning, using a novel attention backbone and synthetic data generation.
Contribution
The authors propose the first zero-shot event matching model capable of cross-dataset wide-baseline correspondence, with a new attention backbone and synthetic data framework.
Findings
Achieves 37.7% improvement over previous methods.
Enables cross-dataset wide-baseline matching without fine-tuning.
Uses a novel motion-robust attention backbone with multi-timescale features.
Abstract
Event cameras have recently shown promising capabilities in instantaneous motion estimation due to their robustness to low light and fast motions. However, computing wide-baseline correspondence between two arbitrary views remains a significant challenge, since event appearance changes substantially with motion, and learning-based approaches are constrained by both scalability and limited wide-baseline supervision. We therefore introduce the first event matching model that achieves cross-dataset wide-baseline correspondence in a zero-shot manner: a single model trained once is deployed on unseen datasets without any target-domain fine-tuning or adaptation. To enable this capability, we introduce a motion-robust and computationally efficient attention backbone that learns multi-timescale features from event streams, augmented with sparsity-aware event token selection, making large-scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
