TETO: Tracking Events with Teacher Observation for Motion Estimation and Frame Interpolation
Jini Yang, Eunbeen Hong, Soowon Son, Hyunkoo Lee, Sunghwan Hong, Sunok Kim, Seungryong Kim

TL;DR
This paper introduces TETO, a novel teacher-student framework that learns event-based motion estimation from limited real-world data, significantly improving point tracking, optical flow, and frame interpolation performance.
Contribution
TETO is the first approach to learn event motion estimation solely from real-world data using knowledge distillation, reducing reliance on synthetic datasets.
Findings
State-of-the-art point tracking on EVIMO2
Superior optical flow on DSEC
Enhanced frame interpolation quality on BS-ERGB and HQ-EVFI
Abstract
Event cameras capture per-pixel brightness changes with microsecond resolution, offering continuous motion information lost between RGB frames. However, existing event-based motion estimators depend on large-scale synthetic data that often suffers from a significant sim-to-real gap. We propose TETO (Tracking Events with Teacher Observation), a teacher-student framework that learns event motion estimation from only 25 minutes of unannotated real-world recordings through knowledge distillation from a pretrained RGB tracker. Our motion-aware data curation and query sampling strategy maximizes learning from limited data by disentangling object motion from dominant ego-motion. The resulting estimator jointly predicts point trajectories and dense optical flow, which we leverage as explicit motion priors to condition a pretrained video diffusion transformer for frame interpolation. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Generative Adversarial Networks and Image Synthesis
