# Alignment Distances on Systems of Bags

**Authors:** Alexander Sagel, Martin Kleinsteuber

arXiv: 1706.04388 · 2017-06-15

## TL;DR

This paper introduces a kernelized alignment distance for Systems of Bags, improving classification of dynamic visual scenes by effectively handling ambiguities in linear dynamic system parameters.

## Contribution

It develops a kernelized alignment distance with a convergent Jacobi-type algorithm, outperforming existing measures in classifying dynamic scenes and textures.

## Key findings

- Outperforms Martin Distance and Maximum Singular Value Distance in classification tasks.
- Effective in classifying abstract mean of video sets, surpassing state-of-the-art methods.
- Converges to critical points, ensuring reliable computation of the alignment distance.

## Abstract

Recent research in image and video recognition indicates that many visual processes can be thought of as being generated by a time-varying generative model. A nearby descriptive model for visual processes is thus a statistical distribution that varies over time. Specifically, modeling visual processes as streams of histograms generated by a kernelized linear dynamic system turns out to be efficient. We refer to such a model as a System of Bags. In this work, we investigate Systems of Bags with special emphasis on dynamic scenes and dynamic textures. Parameters of linear dynamic systems suffer from ambiguities. In order to cope with these ambiguities in the kernelized setting, we develop a kernelized version of the alignment distance. For its computation, we use a Jacobi-type method and prove its convergence to a set of critical points. We employ it as a dissimilarity measure on Systems of Bags. As such, it outperforms other known dissimilarity measures for kernelized linear dynamic systems, in particular the Martin Distance and the Maximum Singular Value Distance, in every tested classification setting. A considerable margin can be observed in settings, where classification is performed with respect to an abstract mean of video sets. For this scenario, the presented approach can outperform state-of-the-art techniques, such as Dynamic Fractal Spectrum or Orthogonal Tensor Dictionary Learning.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.04388/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1706.04388/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/1706.04388/full.md

---
Source: https://tomesphere.com/paper/1706.04388