Learning Multi-modal Similarity

Brian McFee; Gert Lanckriet

arXiv:1008.5163·cs.AI·September 1, 2010·117 cites

Learning Multi-modal Similarity

Brian McFee, Gert Lanckriet

PDF

Open Access

TL;DR

This paper introduces a novel multiple kernel learning method that integrates heterogeneous multi-media data into a unified similarity space, leveraging human perceptual similarity measurements and graph-based filtering for robustness.

Contribution

It proposes a new kernel ensemble technique for multi-modal data integration that aligns with human perceptual similarity and handles measurement subjectivity.

Findings

01

Effective integration of multi-modal data into a single similarity space.

02

Robustness achieved through graph-based filtering of similarity measurements.

03

Improved performance in multimedia retrieval tasks.

Abstract

In many applications involving multi-media data, the definition of similarity between items is integral to several key tasks, e.g., nearest-neighbor retrieval, classification, and recommendation. Data in such regimes typically exhibits multiple modalities, such as acoustic and visual content of video. Integrating such heterogeneous data to form a holistic similarity space is therefore a key challenge to be overcome in many real-world applications. We present a novel multiple kernel learning technique for integrating heterogeneous data into a single, unified similarity space. Our algorithm learns an optimal ensemble of kernel transfor- mations which conform to measurements of human perceptual similarity, as expressed by relative comparisons. To cope with the ubiquitous problems of subjectivity and inconsistency in multi- media similarity, we develop graph-based techniques to filter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Music and Audio Processing · Advanced Image and Video Retrieval Techniques