Joint Object-Material Category Segmentation from Audio-Visual Cues

Anurag Arnab; Michael Sapienza; Stuart Golodetz; Julien Valentin,; Ondrej Miksik; Shahram Izadi; Philip Torr

arXiv:1601.02220·cs.CV·January 12, 2016·1 cites

Joint Object-Material Category Segmentation from Audio-Visual Cues

Anurag Arnab, Michael Sapienza, Stuart Golodetz, Julien Valentin,, Ondrej Miksik, Shahram Izadi, Philip Torr

PDF

Open Access

TL;DR

This paper introduces a joint audio-visual approach for dense object and material segmentation, leveraging sparse auditory cues alongside visual data to improve accuracy in scene understanding.

Contribution

It proposes a novel multi-output labeling framework that combines visual and auditory cues using a random-field model for enhanced scene analysis.

Findings

01

Joint audio-visual cues improve segmentation accuracy

02

The method outperforms visual-only approaches

03

New dataset with paired visual and auditory data is introduced

Abstract

It is not always possible to recognise objects and infer material properties for a scene from visual cues alone, since objects can look visually similar whilst being made of very different materials. In this paper, we therefore present an approach that augments the available dense visual cues with sparse auditory cues in order to estimate dense object and material labels. Since estimates of object class and material properties are mutually informative, we optimise our multi-output labelling jointly using a random-field framework. We evaluate our system on a new dataset with paired visual and auditory data that we make publicly available. We demonstrate that this joint estimation of object and material labels significantly outperforms the estimation of either category in isolation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Image and Video Retrieval Techniques