Reinforced Label Denoising for Weakly-Supervised Audio-Visual Video   Parsing

Yongbiao Gao; Xiangcheng Sun; Guohua Lv; Deng Yu; Sijiu Niu

arXiv:2412.19563·cs.CV·December 30, 2024

Reinforced Label Denoising for Weakly-Supervised Audio-Visual Video Parsing

Yongbiao Gao, Xiangcheng Sun, Guohua Lv, Deng Yu, Sijiu Niu

PDF

Open Access

TL;DR

This paper introduces a joint reinforcement learning approach for label denoising in audio-visual video parsing, improving the accuracy of event recognition and boundary detection by integrating denoising directly into the parsing process.

Contribution

It proposes a novel joint reinforcement learning framework with a validation and feedback mechanism for simultaneous label denoising and video parsing.

Findings

01

Outperforms existing label denoising methods in AVVP tasks.

02

Enhances parsing accuracy when integrated into other AVVP models.

03

Demonstrates effectiveness through extensive experiments.

Abstract

Audio-visual video parsing (AVVP) aims to recognize audio and visual event labels with precise temporal boundaries, which is quite challenging since audio or visual modality might include only one event label with only the overall video labels available. Existing label denoising models often treat the denoising process as a separate preprocessing step, leading to a disconnect between label denoising and AVVP tasks. To bridge this gap, we present a novel joint reinforcement learning-based label denoising approach (RLLD). This approach enables simultaneous training of both label denoising and video parsing models through a joint optimization strategy. We introduce a novel AVVP-validation and soft inter-reward feedback mechanism that directly guides the learning of label denoising policy. Extensive experiments on AVVP tasks demonstrate the superior performance of our proposed method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Signal Denoising Methods · Music and Audio Processing · Speech and Audio Processing