Visual Sync: Multi-Camera Synchronization via Cross-View Object Motion
Shaowei Liu, David Yifan Yao, Saurabh Gupta, Shenlong Wang

TL;DR
VisualSync is an optimization framework that accurately synchronizes multiple unposed, unsynchronized videos by exploiting multi-view dynamics and epipolar constraints, achieving millisecond accuracy without specialized hardware.
Contribution
It introduces a novel method that uses epipolar geometry and multi-view tracking to synchronize videos without prior calibration or manual intervention.
Findings
Achieves median synchronization error below 50 ms
Outperforms baseline methods on diverse datasets
Works with unposed, unsynchronized videos in challenging scenarios
Abstract
Today, people can easily record memorable moments, ranging from concerts, sports events, lectures, family gatherings, and birthday parties with multiple consumer cameras. However, synchronizing these cross-camera streams remains challenging. Existing methods assume controlled settings, specific targets, manual correction, or costly hardware. We present VisualSync, an optimization framework based on multi-view dynamics that aligns unposed, unsynchronized videos at millisecond accuracy. Our key insight is that any moving 3D point, when co-visible in two cameras, obeys epipolar constraints once properly synchronized. To exploit this, VisualSync leverages off-the-shelf 3D reconstruction, feature matching, and dense tracking to extract tracklets, relative poses, and cross-view correspondences. It then jointly minimizes the epipolar error to estimate each camera's time offset. Experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
