Collaborative Hybrid Propagator for Temporal Misalignment in   Audio-Visual Segmentation

Kexin Li; Zongxin Yang; Yi Yang; Jun Xiao

arXiv:2412.08161·cs.CV·December 12, 2024

Collaborative Hybrid Propagator for Temporal Misalignment in Audio-Visual Segmentation

Kexin Li, Zongxin Yang, Yi Yang, Jun Xiao

PDF

Open Access

TL;DR

This paper introduces a novel framework for audio-visual segmentation that effectively addresses temporal misalignment by identifying audio semantic change points and propagating segmentation frames accordingly, improving alignment accuracy.

Contribution

The proposed Collaborative Hybrid Propagator Framework uniquely combines audio boundary detection with frame-by-frame propagation, enhancing temporal alignment in AVVS tasks.

Findings

01

Improves alignment accuracy across three datasets

02

Reduces memory usage compared to traditional methods

03

Can be integrated with existing AVVS approaches

Abstract

Audio-visual video segmentation (AVVS) aims to generate pixel-level maps of sound-producing objects that accurately align with the corresponding audio. However, existing methods often face temporal misalignment, where audio cues and segmentation results are not temporally coordinated. Audio provides two critical pieces of information: i) target object-level details and ii) the timing of when objects start and stop producing sounds. Current methods focus more on object-level information but neglect the boundaries of audio semantic changes, leading to temporal misalignment. To address this issue, we propose a Collaborative Hybrid Propagator Framework~(Co-Prop). This framework includes two main steps: Preliminary Audio Boundary Anchoring and Frame-by-Frame Audio-Insert Propagation. To Anchor the audio boundary, we employ retrieval-assist prompts with Qwen large language models to identify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies

MethodsALIGN · Focus