Beyond Audio and Pose: A General-Purpose Framework for Video Synchronization
Yosub Shin, Igor Molybog

TL;DR
This paper introduces VideoSync, a versatile video synchronization framework that operates independently of specific feature extraction methods, providing a new, more generalizable approach with rigorous evaluation and improved performance.
Contribution
The work presents VideoSync, a general-purpose video synchronization framework evaluated on diverse datasets, correcting biases in prior methods and establishing reproducible benchmarks.
Findings
VideoSync outperforms existing methods like SeSyn-Net under fair conditions.
A CNN-based model is identified as the most effective for offset prediction.
The framework is applicable across single-human, multi-human, and non-human scenarios.
Abstract
Video synchronization-aligning multiple video streams capturing the same event from different angles-is crucial for applications such as reality TV show production, sports analysis, surveillance, and autonomous systems. Prior work has heavily relied on audio cues or specific visual events, limiting applicability in diverse settings where such signals may be unreliable or absent. Additionally, existing benchmarks for video synchronization lack generality and reproducibility, restricting progress in the field. In this work, we introduce VideoSync, a video synchronization framework that operates independently of specific feature extraction methods, such as human pose estimation, enabling broader applicability across different content types. We evaluate our system on newly composed datasets covering single-human, multi-human, and non-human scenarios, providing both the methodology and code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology · Video Analysis and Summarization · Subtitles and Audiovisual Media
