Weakly-supervised Visual Instrument-playing Action Detection in Videos

Jen-Yu Liu; Yi-Hsuan Yang; Shyh-Kang Jeng

arXiv:1805.02031·cs.MM·May 8, 2018·1 cites

Weakly-supervised Visual Instrument-playing Action Detection in Videos

Jen-Yu Liu, Yi-Hsuan Yang, Shyh-Kang Jeng

PDF

Open Access 1 Repo

TL;DR

This paper introduces a weakly-supervised visual method to detect when and where instruments are played in videos, leveraging auxiliary sound and object models to improve localization without extensive manual annotations.

Contribution

It presents a novel weakly-supervised framework that combines sound and object models to localize instrument-playing actions in videos, reducing the need for detailed annotations.

Findings

01

Significant improvement in localization accuracy

02

Effective use of auxiliary models for supervision

03

Validated on a manually annotated dataset

Abstract

Instrument playing is among the most common scenes in music-related videos, which represent nowadays one of the largest sources of online videos. In order to understand the instrument-playing scenes in the videos, it is important to know what instruments are played, when they are played, and where the playing actions occur in the scene. While audio-based recognition of instruments has been widely studied, the visual aspect of the music instrument playing remains largely unaddressed in the literature. One of the main obstacles is the difficulty in collecting annotated data of the action locations for training-based methods. To address this issue, we propose a weakly-supervised framework to find when and where the instruments are played in the videos. We propose to use two auxiliary models, a sound model and an object model, to provide supervisions for training the instrument-playing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ciaua/InstrumentPlayingDetection
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Human Pose and Action Recognition · Music Technology and Sound Studies