Metric Learning with Progressive Self-Distillation for Audio-Visual Embedding Learning
Donghuo Zeng, Kazushi Ikeda

TL;DR
This paper introduces a novel audio-visual embedding learning method that combines cross-modal triplet loss with progressive self-distillation to better utilize inherent data distributions and improve alignment beyond label guidance.
Contribution
It proposes a new architecture that integrates progressive self-distillation with triplet loss for enhanced audio-visual embedding learning, capturing complex relationships beyond labels.
Findings
Improved embedding quality demonstrated on benchmark datasets.
Enhanced alignment accuracy between audio and visual modalities.
Outperforms existing label-guided methods in various metrics.
Abstract
Metric learning projects samples into an embedded space, where similarities and dissimilarities are quantified based on their learned representations. However, existing methods often rely on label-guided representation learning, where representations of different modalities, such as audio and visual data, are aligned based on annotated labels. This approach tends to underutilize latent complex features and potential relationships inherent in the distributions of audio and visual data that are not directly tied to the labels, resulting in suboptimal performance in audio-visual embedding learning. To address this issue, we propose a novel architecture that integrates cross-modal triplet loss with progressive self-distillation. Our method enhances representation learning by leveraging inherent distributions and dynamically refining soft audio-visual alignments -- probabilistic alignments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation
MethodsTriplet Loss
