ASOD60K: An Audio-Induced Salient Object Detection Dataset for Panoramic Videos
Yi Zhang

TL;DR
This paper introduces ASOD60K, a large-scale dataset for audio-induced salient object detection in panoramic videos, along with benchmarks to advance research in this emerging area.
Contribution
The paper presents the first large-scale dataset ASOD60K for audio-induced salient object detection in panoramic videos, including detailed annotations and benchmark evaluations.
Findings
Existing SOD models face challenges with panoramic video data.
Audio cues significantly influence human attention in panoramic scenes.
Benchmark results highlight the need for specialized models for PV-SOD.
Abstract
Exploring to what humans pay attention in dynamic panoramic scenes is useful for many fundamental applications, including augmented reality (AR) in retail, AR-powered recruitment, and visual language navigation. With this goal in mind, we propose PV-SOD, a new task that aims to segment salient objects from panoramic videos. In contrast to existing fixation-/object-level saliency detection tasks, we focus on audio-induced salient object detection (SOD), where the salient objects are labeled with the guidance of audio-induced eye movements. To support this task, we collect the first large-scale dataset, named ASOD60K, which contains 4K-resolution video frames annotated with a six-level hierarchy, thus distinguishing itself with richness, diversity and quality. Specifically, each sequence is marked with both its super-/sub-class, with objects of each sub-class being further annotated with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Gaze Tracking and Assistive Technology · Olfactory and Sensory Function Studies
