SyncBreaker:Stage-Aware Multimodal Adversarial Attacks on Audio-Driven Talking Head Generation

Wenli Zhang; Xianglong Shi; Sirui Zhao; Xinqi Chen; Guo Cheng; Yifan Xu; Tong Xu; Yong Liao

arXiv:2604.08405·cs.CV·April 13, 2026

SyncBreaker:Stage-Aware Multimodal Adversarial Attacks on Audio-Driven Talking Head Generation

Wenli Zhang, Xianglong Shi, Sirui Zhao, Xinqi Chen, Guo Cheng, Yifan Xu, Tong Xu, Yong Liao

PDF

1 Repo

TL;DR

SyncBreaker is a stage-aware multimodal adversarial attack framework that jointly perturbs audio and image inputs to effectively disrupt audio-driven talking head generation while maintaining perceptual quality.

Contribution

It introduces a novel multimodal protection method with stage-aware perturbations, including nullifying supervision with MIS and cross-attention fooling, outperforming single-modality baselines.

Findings

01

SyncBreaker more effectively degrades lip sync and facial dynamics.

02

It preserves input perceptual quality.

03

It remains robust under purification.

Abstract

Diffusion-based audio-driven talking-head generation enables realistic portrait animation, but also introduces risks of misuse, such as fraud and misinformation. Existing protection methods are largely limited to a single modality, and neither image-only nor audio-only attacks can effectively suppress speech-driven facial dynamics. To address this gap, we propose SyncBreaker, a stage-aware multimodal protection framework that jointly perturbs portrait and audio inputs under modality-specific perceptual constraints. Our key contributions are twofold. First, for the image stream, we introduce nullifying supervision with Multi-Interval Sampling (MIS) across diffusion stages to steer the generation toward the static reference portrait by aggregating guidance from multiple denoising intervals. Second, for the audio stream, we propose Cross-Attention Fooling (CAF), which suppresses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kitty384/SyncBreaker
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.