Semantic Noise Reduction via Teacher-Guided Dual-Path Audio-Visual Representation Learning

Linge Wang; Yingying Chen; Bingke Zhu; Lu Zhou; Jinqiao Wang

arXiv:2604.08147·cs.SD·April 10, 2026

Semantic Noise Reduction via Teacher-Guided Dual-Path Audio-Visual Representation Learning

Linge Wang, Yingying Chen, Bingke Zhu, Lu Zhou, Jinqiao Wang

PDF

1 Repo

TL;DR

This paper introduces TG-DP, a dual-path framework that decouples reconstruction and alignment in audio-visual learning, guided by a teacher model, leading to improved zero-shot retrieval and robust representations.

Contribution

TG-DP's novel separation of objectives and teacher guidance enhances cross-modal alignment and reduces semantic noise in large-scale audio-visual pretraining.

Findings

01

Achieves state-of-the-art zero-shot retrieval performance on AudioSet.

02

Improves R@1 from 35.2% to 37.4% for video-to-audio retrieval.

03

Maintains semantic robustness with top linear-probe results on AS20K and VGGSound.

Abstract

Recent advances in audio-visual representation learning have shown the value of combining contrastive alignment with masked reconstruction. However, jointly optimizing these objectives in a single forward pass forces the contrastive branch to rely on randomly visible patches designed for reconstruction rather than cross-modal alignment, introducing semantic noise and optimization interference. We propose TG-DP, a Teacher-Guided Dual-Path framework that decouples reconstruction and alignment into separate optimization paths. By disentangling the masking regimes of the two branches, TG-DP enables the contrastive pathway to use a visibility pattern better suited to cross-modal alignment. A teacher model further provides auxiliary guidance for organizing visible tokens in this branch, helping reduce interference and stabilize cross-modal representation learning. TG-DP achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wanglg20/TG-DP
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.