Asynchronous Pipeline Parallelism for Real-Time Multilingual Lip Synchronization in Video Communication Systems

Eren Caglar; Amirkia Rafiei Oskooei; Mehmet Kutanoglu; Mustafa Keles; and Mehmet S. Aktas

arXiv:2512.18318·cs.MM·December 23, 2025

Asynchronous Pipeline Parallelism for Real-Time Multilingual Lip Synchronization in Video Communication Systems

Eren Caglar, Amirkia Rafiei Oskooei, Mehmet Kutanoglu, Mustafa Keles, and Mehmet S. Aktas

PDF

Open Access

TL;DR

This paper presents a novel asynchronous pipeline framework for real-time multilingual lip synchronization in video conferencing, significantly reducing latency and improving efficiency through optimized module execution and semantic speech segmentation.

Contribution

It introduces a parallel, asynchronous Transformer-based architecture with message-queue decoupling, optimized inference workflows, and context-aware silence detection for enhanced real-time lip synchronization.

Findings

01

Reduces end-to-end latency by up to 3.1 times

02

Improves processing speed and resource utilization

03

Maintains high translation accuracy and visual quality

Abstract

This paper introduces a parallel and asynchronous Transformer framework designed for efficient and accurate multilingual lip synchronization in real-time video conferencing systems. The proposed architecture integrates translation, speech processing, and lip-synchronization modules within a pipeline-parallel design that enables concurrent module execution through message-queue-based decoupling, reducing end-to-end latency by up to 3.1 times compared to sequential approaches. To enhance computational efficiency and throughput, the inference workflow of each module is optimized through low-level graph compilation, mixed-precision quantization, and hardware-accelerated kernel fusion. These optimizations provide substantial gains in efficiency while preserving model accuracy and visual quality. In addition, a context-adaptive silence-detection component segments the input speech stream at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Multimedia Communication and Technology · Digital Filter Design and Implementation