Pair-Wise Cluster Analysis
David R. Hardoon, Kristiaan Pelcksman

TL;DR
This paper introduces a new approach to cluster analysis across different data representations, leveraging label correspondence and CCA, with theoretical analysis and a practical kernel-based algorithm demonstrated on multilingual document data.
Contribution
It presents a novel problem setting for cross-representation clustering, analyzes it using PAC-Bayesian theory, and develops a kernel-based algorithm related to CCA for multi-view data.
Findings
The PAC-Bayesian analysis provides theoretical insights.
The kernel-based algorithm effectively finds corresponding clusters.
Application to multilingual documents validates the approach.
Abstract
This paper studies the problem of learning clusters which are consistently present in different (continuously valued) representations of observed data. Our setup differs slightly from the standard approach of (co-) clustering as we use the fact that some form of `labeling' becomes available in this setup: a cluster is only interesting if it has a counterpart in the alternative representation. The contribution of this paper is twofold: (i) the problem setting is explored and an analysis in terms of the PAC-Bayesian theorem is presented, (ii) a practical kernel-based algorithm is derived exploiting the inherent relation to Canonical Correlation Analysis (CCA), as well as its extension to multiple views. A content based information retrieval (CBIR) case study is presented on the multi-lingual aligned Europal document dataset which supports the above findings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Image Retrieval and Classification Techniques
