# Machine learning methods for multimedia information retrieval

**Authors:** B\'alint Zolt\'an Dar\'oczy

arXiv: 1705.04964 · 2017-05-16

## TL;DR

This thesis explores multimodal feature extraction and similarity kernels for multimedia retrieval and classification, demonstrating their effectiveness across various datasets and proposing future enhancements with complex graph models.

## Contribution

It introduces similarity kernel methods for multimedia retrieval, showing their competitive performance and suggesting their applicability to diverse generative models and complex graph structures.

## Key findings

- Similarity kernel improves over state-of-the-art in multimedia retrieval
- Generative models based on instance similarities are broadly applicable
- Fisher kernel is a powerful tool for classification and regression

## Abstract

In this thesis we examined several multimodal feature extraction and learning methods for retrieval and classification purposes. We reread briefly some theoretical results of learning in Section 2 and reviewed several generative and discriminative models in Section 3 while we described the similarity kernel in Section 4. We examined different aspects of the multimodal image retrieval and classification in Section 5 and suggested methods for identifying quality assessments of Web documents in Section 6. In our last problem we proposed similarity kernel for time-series based classification. The experiments were carried over publicly available datasets and source codes for the most essential parts are either open source or released. Since the used similarity graphs (Section 4.2) are greatly constrained for computational purposes, we would like to continue work with more complex, evolving and capable graphs and apply for different problems such as capturing the rapid change in the distribution (e.g. session based recommendation) or complex graphs of the literature work. The similarity kernel with the proper metrics reaches and in many cases improves over the state-of-the-art. Hence we may conclude generative models based on instance similarities with multiple modes is a generally applicable model for classification and regression tasks ranging over various domains, including but not limited to the ones presented in this thesis. More generally, the Fisher kernel is not only unique in many ways but one of the most powerful kernel functions. Therefore we may exploit the Fisher kernel in the future over widely used generative models, such as Boltzmann Machines [Hinton et al., 1984], a particular subset, the Restricted Boltzmann Machines and Deep Belief Networks [Hinton et al., 2006]), Latent Dirichlet Allocation [Blei et al., 2003] or Hidden Markov Models [Baum and Petrie, 1966] to name a few.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.04964/full.md

## Figures

30 figures with captions in the complete paper: https://tomesphere.com/paper/1705.04964/full.md

## References

149 references — full list in the complete paper: https://tomesphere.com/paper/1705.04964/full.md

---
Source: https://tomesphere.com/paper/1705.04964