Weakly-Supervised Audio-Visual Segmentation

Shentong Mo; Bhiksha Raj

arXiv:2311.15080·cs.CV·November 28, 2023·2 cites

Weakly-Supervised Audio-Visual Segmentation

Shentong Mo, Bhiksha Raj

PDF

Open Access

TL;DR

This paper introduces WS-AVS, a weakly-supervised framework for audio-visual segmentation that uses instance-level annotations and multi-scale contrastive learning to effectively segment sound sources in videos.

Contribution

The paper proposes a novel weakly-supervised framework, WS-AVS, that leverages multi-scale contrastive learning for audio-visual segmentation with less detailed supervision.

Findings

01

WS-AVS outperforms existing methods on AVSBench.

02

Effective in both single-source and multi-source scenarios.

03

Reduces reliance on pixel-wise masks.

Abstract

Audio-visual segmentation is a challenging task that aims to predict pixel-level masks for sound sources in a video. Previous work applied a comprehensive manually designed architecture with countless pixel-wise accurate masks as supervision. However, these pixel-level masks are expensive and not available in all cases. In this work, we aim to simplify the supervision as the instance-level annotation, i.e., weakly-supervised audio-visual segmentation. We present a novel Weakly-Supervised Audio-Visual Segmentation framework, namely WS-AVS, that can learn multi-scale audio-visual alignment with multi-scale multiple-instance contrastive learning for audio-visual segmentation. Extensive experiments on AVSBench demonstrate the effectiveness of our WS-AVS in the weakly-supervised audio-visual segmentation of single-source and multi-source scenarios.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation

MethodsContrastive Learning