# Clustering and Recognition of Spatiotemporal Features through   Interpretable Embedding of Sequence to Sequence Recurrent Neural Networks

**Authors:** Kun Su, Eli Shlizerman

arXiv: 1905.12176 · 2020-02-03

## TL;DR

This paper introduces an interpretable embedding technique for RNN Seq2Seq models that visualizes and clusters spatiotemporal data, enabling unsupervised analysis and recognition of dynamic sequences like human movements.

## Contribution

It presents a novel embedding approach for RNN Seq2Seq models that facilitates visualization, interpretation, and unsupervised clustering of spatiotemporal features.

## Key findings

- Embedding space clusters capture sequence similarities and differences.
- Method enables high-quality unsupervised categorization of human movements.
- Applicable to various time-dependent problems like segmentation and activity recognition.

## Abstract

Encoder-decoder recurrent neural network models (RNN Seq2Seq) have achieved great success in ubiquitous areas of computation and applications. It was shown to be successful in modeling data with both temporal and spatial dependencies for translation or prediction tasks. In this study, we propose an embedding approach to visualize and interpret the representation of data by these models. Furthermore, we show that the embedding is an effective method for unsupervised learning and can be utilized to estimate the optimality of model training. In particular, we demonstrate that embedding space projections of the decoder states of RNN Seq2Seq model trained on sequences prediction are organized in clusters capturing similarities and differences in the dynamics of these sequences. Such performance corresponds to an unsupervised clustering of any spatio-temporal features and can be employed for time-dependent problems such as temporal segmentation, clustering of dynamic activity, self-supervised classification, action recognition, failure prediction, etc. We test and demonstrate the application of the embedding methodology to time-sequences of 3D human body poses. We show that the methodology provides a high-quality unsupervised categorization of movements.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.12176/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1905.12176/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1905.12176/full.md

---
Source: https://tomesphere.com/paper/1905.12176