Soft Uncoupling of Markov Chains for Permeable Language Distinction: A   New Algorithm

Richard Nock; Pascal Vaillant; Frank Nielsen; Claudia Henry

arXiv:0810.1261·cs.CL·October 8, 2008

Soft Uncoupling of Markov Chains for Permeable Language Distinction: A New Algorithm

Richard Nock, Pascal Vaillant, Frank Nielsen, Claudia Henry

PDF

Open Access

TL;DR

This paper introduces a novel spectral clustering extension that employs soft, probabilistic assignments and a new Markov chain construction to accurately distinguish languages with permeable borders in an unsupervised manner.

Contribution

It presents a new algorithm that replaces hard clustering with soft probabilistic assignments and introduces a novel Markov chain construction for language distinction.

Findings

01

Accurately distinguishes languages with permeable borders

02

Provides visually appealing soft language distinctions

03

Outperforms traditional spectral clustering methods

Abstract

Without prior knowledge, distinguishing different languages may be a hard task, especially when their borders are permeable. We develop an extension of spectral clustering -- a powerful unsupervised classification toolbox -- that is shown to resolve accurately the task of soft language distinction. At the heart of our approach, we replace the usual hard membership assignment of spectral clustering by a soft, probabilistic assignment, which also presents the advantage to bypass a well-known complexity bottleneck of the method. Furthermore, our approach relies on a novel, convenient construction of a Markov chain out of a corpus. Extensive experiments with a readily available system clearly display the potential of the method, which brings a visually appealing soft distinction of languages that may define altogether a whole corpus.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Blind Source Separation Techniques · Face and Expression Recognition