DCAR: A Discriminative and Compact Audio Representation to Improve Event Detection
Liping Jing, Bo Liu, Jaeyoung Choi, Adam Janin, Julia Bernd, Michael, W. Mahoney, and Gerald Friedland

TL;DR
This paper introduces DCAR, a novel two-phase audio representation method that enhances event detection in videos by making features more discriminative and compact, outperforming existing representations on a large dataset.
Contribution
The paper proposes a new two-phase approach for audio representation that combines GMM modeling with Grassmannian manifold optimization, improving event detection accuracy.
Findings
DCAR outperforms state-of-the-art audio representations.
Significant improvements in both easy and hard discrimination tasks.
Notable accuracy gains on events with lower human annotator confidence.
Abstract
This paper presents a novel two-phase method for audio representation, Discriminative and Compact Audio Representation (DCAR), and evaluates its performance at detecting events in consumer-produced videos. In the first phase of DCAR, each audio track is modeled using a Gaussian mixture model (GMM) that includes several components to capture the variability within that track. The second phase takes into account both global structure and local structure. In this phase, the components are rendered more discriminative and compact by formulating an optimization problem on Grassmannian manifolds, which we found represents the structure of audio effectively. Our experiments used the YLI-MED dataset (an open TRECVID-style video corpus based on YFCC100M), which includes ten events. The results show that the proposed DCAR representation consistently outperforms state-of-the-art audio…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Video Analysis and Summarization
