Multiple-Kernel Dictionary Learning for Reconstruction and Clustering of Unseen Multivariate Time-series
Babak Hosseini, Barbara Hammer

TL;DR
This paper introduces a novel multiple-kernel dictionary learning method for reconstructing and clustering unseen multivariate time-series data, addressing challenges in recognizing complex motion data.
Contribution
The work proposes a new MKD approach that learns semantic attributes from MTS data, enabling reconstruction and unsupervised clustering of unseen classes.
Findings
Effective reconstruction of unseen MTS data
High accuracy in online clustering of unseen classes
Interpretable semantic attribute learning
Abstract
There exist many approaches for description and recognition of unseen classes in datasets. Nevertheless, it becomes a challenging problem when we deal with multivariate time-series (MTS) (e.g., motion data), where we cannot apply the vectorial algorithms directly to the inputs. In this work, we propose a novel multiple-kernel dictionary learning (MKD) which learns semantic attributes based on specific combinations of MTS dimensions in the feature space. Hence, MKD can fully/partially reconstructs the unseen classes based on the training data (seen classes). Furthermore, we obtain sparse encodings for unseen classes based on the learned MKD attributes, and upon which we propose a simple but effective incremental clustering algorithm to categorize the unseen MTS classes in an unsupervised way. According to the empirical evaluation of our MKD framework on real benchmarks, it provides an…
| Cricket | CMU | Words | Squat | |
| DRA (%) | 76.4 | 84.5 | 80.2 | 62.6 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Anomaly Detection Techniques and Applications · Human Pose and Action Recognition
Multiple-Kernel Dictionary Learning for
Reconstruction and Clustering of Unseen Multivariate Time-series
Babak Hosseini
CITEC cluster of excellence
Bielefeld University, Germany
&Barbara Hammer
CITEC cluster of excellence
Bielefeld University, Germany
Preprint of the publication [1], as provided by the authors. The final publication is available at https://www.elen.ucl.ac.be/esann/index.php?pg=proceedings
Abstract
There exist many approaches for description and recognition of unseen classes in datasets. Nevertheless, it becomes a challenging problem when we deal with multivariate time-series (MTS) (e.g., motion data), where we cannot apply the vectorial algorithms directly to the inputs. In this work, we propose a novel multiple-kernel dictionary learning (MKD) which learns semantic attributes based on specific combinations of MTS dimensions in the feature space. Hence, MKD can fully/partially reconstructs the unseen classes based on the training data (seen classes). Furthermore, we obtain sparse encodings for unseen classes based on the learned MKD attributes, and upon which we propose a simple but effective incremental clustering algorithm to categorize the unseen MTS classes in an unsupervised way. According to the empirical evaluation of our MKD framework on real benchmarks, it provides an interpretable reconstruction of unseen MTS data as well as a high performance regarding their online clustering.
1 Introduction
Zero-shot learning is the problem of recognizing novel categories of data when no prior information is available during the training phase [2, 3, 4]. One practical approach to such transfer learning is the incorporation of semantic attributes as descriptive features to map the input data to an intermediate semantic space, which can discriminate between different unseen categories [3, 4]. Another concern in this area of research is the partial/complete reconstruction of the unseen classes based on their relation to the learned semantic attributes or the training data [5, 6].
An important application of zero-shot learning is multivariate time-series (MTS) in the general meaning such as audio data and human motions [7, 8] with a considerable number of unknown classes. Different from images and video, MTS do not possess any general spatial dependency between its dimensions. Nevertheless, it is usually expected to find semantic attributes shared between different classes of an MTS dataset. As an example of MTS data, consider the Cricket Umpire signal Out in Fig. 1 which can be described as the left hand is raised while the right hand is down. Such encoding provides us with a semantic understanding of the data without having any prior knowledge about its class label. We can also consider such descriptions as semantic attributes in order to distinguish the unknown MTS data samples into distinct categories that reflect their unknown labels. Although the semantic descriptions are class specific, we can share the individual attributes among classes which have between-class partial similarities.
Sparse coding (SRC) is the idea of constructing an input data using weighted combinations (sparse codes) of sparse selected entries from a set of learned bases (dictionary). Such sparse representations can capture essential intrinsic characteristics of a dataset [9]. Furthermore, via assuming an implicit mapping of the data to a high-dimensional feature space, it is possible to formulate SRC using the kernel representation of the data [10] to model also nonlinear data structures. Consequently, a subset of the existing research has benefited from SRC methods in designing more effective attributes for dealing with unseen classes of data; however, these efforts are mainly limited to the image (spatial) and video (spatiotemporal) datasets [6, 11]
Despite the current achievements in learning unseen MTS data, either the existing methods are depended on having prior information about the novel classes (e.g., samples/labels) [8], or they cannot interpret the unseen data based on their learned attributes. Furthermore, to our knowledge, there is no research reported on the partial/complete reconstruction of unseen classes for MTS data in general (e.g., recorded motion signals).
To address the above concerns, we provide the following contributions:
1- We design a novel dictionary structure which learns attributes that can represent MTS based on the dimension level.
2- We propose an unsupervised kernel-based SRC method for partial reconstruction of unseen MTS data in the feature space along with their interpretable encoding.
3- We design an incremental clustering based on the sparse encodings of the unseen data which gradually creates a clustering dendrogram of the unseen classes.
After formulating the problem in Sec. 2, we introduce and explain our proposed framework in Sec. 4, and we evaluate it in Sec. 4 followed by the conclusion section.
2 Problem Statement
Presenting a multivariate time-series in the vectorial space, denotes sequence , where represents the number of dimensions and the sequence lengths respectively. The training set belongs to distinct data classes with the label set . Accordingly, the set of unseen MTS belongs to the label set , such that . Based on the above description, we are interested in: **1-**Obtaining semantic attributes which create interpretable relations between sequences and the seen classes (Fig. 1). **2-**Using the obtained semantic attributes for efficient clustering of the unseen set .
3 Multiple-Kernel Dictionary Learning Framework
Similar to Fig. 1, it is a common observation for real-world MTS data (e.g., human motions) to find partial similarities between different data classes when considering a subset of their dimensions. Therefore, these similarities can lead to an interpretable description for a novel data sample (from ) via its relation to the seen classes (from ). Furthermore, such a description leads to a better clustering of novel data points without having any prior information on their class labels. To achieve the above, we design a specific multiple-kernel dictionary (MKD) structure which is trained based on and learns semantic attributes similar to Fig. 1-left. To be more specific, MKD combines dimensions of similar MTS samples in the feature space under non-negativity constraints. These attributes can encode each unseen as an interpretable description of its dimensions and to better separate it from previous (unknown) classes in (Fig. 1-right).
To be more specific, we assume there exist non-linear implicit kernel functions to map each dimension of into an individual RKH-spaces [10]. A weighted combination of these kernels with individual coefficients (entries of ) induces an embedding of the data in the feature space as . We can apply this embedding to the whole training data via , and additionally we consider different weighting schemes of the individual kernels as to complement different existing classes in the data. Now, We define our novel multiple kernel dictionary (MKD) matrix as
[TABLE]
Each dictionary column is a weighted combination of selected dimensions and selected samples from based on the value of and respectively. Due to the relation of to different dimensions of , its columns can learn semantic attributes similar to those of Fig. 1.
To fit to the data efficiently, we aim for the sparse reconstruction in the feature space based on a sparse matrix of codings . To that aim, We propose the following MKD sparse coding framework (MKD-SC) for training the dictionary parameters and sparse codes :
[TABLE]
where denote the -th entry of the -th column of respectively. The loss term in Eq. 1 measures the reconstruction error of the sparse coding based on the Frobenius norm . The term denotes the -norm which employs sparsity constraints for elements of via the constant which results in having each constructed with sparse contributions from . The -norm constraint on prevents the optimization solutions from becoming degenerated [9].
Hence the dictionary , which results from the optimization problem in Eq. 1, contains attributes (columns), which are weighted combinations of different exemplars and dimensions from . The non-negativity constraints result in having similar resources become combined which leads to learning semantic attributes for and an interpretable sparse description based on each [12]. In the Sec. 3.2 and 3.3, we benefit from this framework to describe and categorize unseen MTS samples.
3.1 Optimization Scheme
We optimize the parameters , , and in alternating steps, such that at each update step, we optimize Eq. 1 with respect to one parameter while fixing the others. Based on the dot-product relations , it is possible to rewrite Eq. 1 in terms of each of individually to obtain a general convex form of
[TABLE]
in which are computed without any explicit reference to the embeddings . Such problems can be optimized via the non-negative quadratic pursuit (NQP111https://github.com/bab-git/NQP) algorithm from [13]. Due to the page limit, we will put the detail regarding the reformulation of Eq. 1 and the optimization steps in the online extended version of the paper 222https://github.com/bab-git/MKD$\_$Unseen$\_$MTS.
3.2 Partial Reconstruction of Unseen MTS
In realistic MTS datasets such as human actions, it is expected to observe partial similarities between the dimensions of different classes. Therefore, we define the following error measure for the reconstruction of a selected set of dimensions related to data :
[TABLE]
where , and are modified versions of and the identity matrix respectively via making all the entries zero except the rows corresponding to . Consequently, the learned dictionary can partially reconstruct the unseen time-series for the subset of its dimensions, if is relatively small.
3.3 Incremental Clustering of Unseen MTS
We propose Algorithm 1 relying on the partial similarity of different MTS classes and the descriptive quality of the learned attributes of MKD. This algorithm incrementally clusters the unseen sequences of into a dendrogram in an online fashion, and also finds the potential sub-clusters among them. To that aim, for each unknown MTS sequence , we prepare an encoding matrix , -th column of which represents the weights of contribution from in the reconstruction of the -th dimension of . Therefore, where denotes the -th entry of the -th column of . This matrix is considered as a rich encoded descriptor for dimensions of based on and is used in Algorithm 1 to compare to the previously categorized unseen data in to find the best place for in the dendrogram. Line 1 of the algorithm finds as the most similar node to based on the distance term , and the intra-cluster distance for each node as , where . Regarding line 1, We choose in our experiments which results in an acceptable clustering outcome.
4 Experiments
To evaluate the performance of our sparse coding framework for representation and discrimination of unseen data, we choose the MTS datasets Cricket Umpire, CMU mocap, Articulatory Words, and Squat with the descriptions provided by [12]. For all the datasets, the Gaussian kernel matrices are computed as , where is the computed pairwise DTW-distance between the -th dimension of and [12] (but can be substituted with any other preferred distance). For tuning and the dictionary size in Eq. 1, we use 5-fold cross-validation.
4.1 Partial Reconstruction Results
In order to evaluate the reconstruction quality for each unseen data , we define the dimension-reconstruction accuracy measure as using Eq. 3.
Furthermore, each reconstructed dimension of which satisfies the above threshold is interpreted via the class of data with the most contribution as in Sec. 3.2. Table 1 reports the DRA values for the selected MTS datasets, where the CMU and Words datasets have higher DRA values due to their diverse set of training classes which increases the dimension-level similarity between seen and unseen classes. As an example, We illustrate the dimension-level reconstruction of 2 unseen categories from the Cricket dataset in Fig. 2, in which the No ball class is fully reconstructed via its relation to the movement of the left hand in the Short class and to that of the right hand in the Wide class.
4.2 Incremental Clustering Results
To evaluate the incremental clustering of Sec. 3.3 we use the average clustering error (CE) and normalized mutual information (NMI) [14]. As the most relevant baseline, we choose the self-learning algorithm [8] without its novelty detection part. Besides, we implement the spectral clustering algorithm on the original kernel matrix to compare our framework to the regular clustering of . As another baseline, we also use the NNKSC algorithm [12] as the single-kernel predecessor of MKD-SC, for which the matrix becomes an -dimensional vector.
According to the clustering results in Table 2, the proposed MKD-SC method provides encodings which lead to better clustering of the unseen data compared to the baselines. The superiority of the spectral-clustering over NNKSC and self-learning methods (e.g., for Cricket dataset) depends on the discriminative quality of the original kernels. Self-learning method can have a better performance than NNKSC and spectral-clustering when its descriptor-based features can better discriminate between the different categories of the unseen classes.
5 Conclusion
In this research, we proposed an unsupervised framework which provides interpretable analysis of unseen classes in MTS datasets. It is constructed based on a novel MKD structure which uses the kernel representations of MTS dimensions to learn semantic attributes. Based on these attributes, our unsupervised MKD-SC framework reconstructs the unseen classes (partially/entirely) in the feature space according to the relation of their dimensions to those of the seen categories which provides an interpretable description of the novel data. Based on the obtained sparse encodings, we proposed an incremental clustering to categorize novel MTS into distinct clusters gradually. Experiments on real MTS benchmarks show the effectiveness of our MKD-SC framework in obtaining interpretable descriptions for unseen MTS classes. Additionally, the incremental clustering provides better clustering accuracy comparing to the baselines.
Acknowledgement
This research was supported by the Cluster of Excellence Cognitive Interaction Technology ’CITEC’ (EXC 277) at Bielefeld University, which is funded by the German Research Foundation (DFG).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] B. Hosseini and B. Hammer. Multiple-kernel dictionary learning for reconstruction and clustering of unseen multivariate time-series. In 27th European Symposium on Artificial Neural Networks (ESANN) , 2019.
- 2[2] Ibrahim Alabdulmohsin, Moustapha Cisse, and Xiangliang Zhang. Is attribute-based zero-shot learning an ill-posed strategy? In ECML/PKDD’16 , pages 749–760. Springer, 2016.
- 3[3] Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. Learning to detect unseen object classes by between-class attribute transfer. In CVPR’09 , pages 951–958. IEEE, 2009.
- 4[4] Richard Socher, Milind Ganjoo, Christopher D Manning, and Andrew Ng. Zero-shot learning through cross-modal transfer. In Advances in neural information processing systems , pages 935–943, 2013.
- 5[5] Peixi Peng, Yonghong Tian, Tao Xiang, Yaowei Wang, Massimiliano Pontil, and Tiejun Huang. Joint semantic and latent attribute modelling for cross-class transfer learning. TPAMI , 40(7):1625–1638, 2018.
- 6[6] Qiang Qiu, Zhuolin Jiang, and Rama Chellappa. Sparse dictionary-based representation and recognition of action attributes. In ICCV’11 , pages 707–714. IEEE, 2011.
- 7[7] Heng-Tze Cheng, Feng-Tso Sun, Martin Griss, Paul Davis, and Jianguo Li. Nuactiv: Recognizing unseen new activities using semantic attribute-based learning. In Mobi Sys’13 , pages 361–374. ACM, 2013.
- 8[8] Di Lu, Junqi Guo, and Xi Zhou. Self-learning based motion recognition using sensors embedded in a smartphone for mobile healthcare. In WASA’16 , pages 343–355. Springer, 2016.
