# RocSync: Millisecond-Accurate Temporal Synchronization for Heterogeneous Camera Systems

**Authors:** Jaro Meyer, Frédéric Giraud, Joschua Wüthrich, Marc Pollefeys, Philipp Fürnstahl, Lilian Calvet

PMC · DOI: 10.3390/s26031036 · Sensors (Basel, Switzerland) · 2026-02-05

## TL;DR

RocSync provides a low-cost method to synchronize different types of cameras with millisecond accuracy, improving 3D reconstruction and pose estimation in real-world settings.

## Contribution

A general-purpose, low-cost synchronization method achieving millisecond-level accuracy across heterogeneous camera systems.

## Key findings

- RocSync achieves 1.34 ms RMSE residual error compared to hardware synchronization.
- The method outperforms light-, audio-, and timecode-based synchronization approaches.
- It improves downstream tasks like multi-view pose estimation and 3D reconstruction.

## Abstract

Accurate spatiotemporal alignment of multi-view video streams is essential for a wide range of dynamic-scene applications such as multi-view 3D reconstruction, pose estimation, and scene understanding. However, synchronizing multiple cameras remains a significant challenge, especially in heterogeneous setups combining professional- and consumer-grade devices, visible and infrared sensors, or systems with and without audio, where common hardware synchronization capabilities are often unavailable. This limitation is particularly evident in real-world environments, where controlled capture conditions are not feasible. In this work, we present a low-cost, general-purpose synchronization method that achieves millisecond-level temporal alignment across diverse camera systems while supporting both visible (RGB) and infrared (IR) modalities. The proposed solution employs a custom-built LED Clock that encodes time through red and infrared LEDs, allowing visual decoding of the exposure window (start and end times) from recorded frames for millisecond-level synchronization. We benchmark our method against hardware synchronization and achieve a residual error of 1.34 ms RMSE across multiple recordings. In further experiments, our method outperforms light-, audio-, and timecode-based synchronization approaches and directly improves downstream computer vision tasks, including multi-view pose estimation and 3D reconstruction. Finally, we validate the system in large-scale surgical recordings involving over 25 heterogeneous cameras spanning both IR and RGB modalities. This solution simplifies and streamlines the synchronization pipeline and expands access to advanced vision-based sensing in unconstrained environments, including industrial and clinical applications.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12900090/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12900090/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/PMC12900090/full.md

---
Source: https://tomesphere.com/paper/PMC12900090