Soft Uncoupling of Markov Chains for Permeable Language Distinction: A New Algorithm
Richard Nock, Pascal Vaillant, Frank Nielsen, Claudia Henry

TL;DR
This paper introduces a novel spectral clustering extension that employs soft, probabilistic assignments and a new Markov chain construction to accurately distinguish languages with permeable borders in an unsupervised manner.
Contribution
It presents a new algorithm that replaces hard clustering with soft probabilistic assignments and introduces a novel Markov chain construction for language distinction.
Findings
Accurately distinguishes languages with permeable borders
Provides visually appealing soft language distinctions
Outperforms traditional spectral clustering methods
Abstract
Without prior knowledge, distinguishing different languages may be a hard task, especially when their borders are permeable. We develop an extension of spectral clustering -- a powerful unsupervised classification toolbox -- that is shown to resolve accurately the task of soft language distinction. At the heart of our approach, we replace the usual hard membership assignment of spectral clustering by a soft, probabilistic assignment, which also presents the advantage to bypass a well-known complexity bottleneck of the method. Furthermore, our approach relies on a novel, convenient construction of a Markov chain out of a corpus. Extensive experiments with a readily available system clearly display the potential of the method, which brings a visually appealing soft distinction of languages that may define altogether a whole corpus.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Blind Source Separation Techniques · Face and Expression Recognition
