How to improve robustness in Kohonen maps and display additional   information in Factorial Analysis: application to text mining

Nicolas Bourgeois (SAMM); Marie Cottrell (SAMM); Benjamin D\'eruelle; (LAMOP); St\'ephane Lamass\'e (LAMOP); Patrick Letr\'emy (SAMM)

arXiv:1506.07732·math.ST·June 26, 2015

How to improve robustness in Kohonen maps and display additional information in Factorial Analysis: application to text mining

Nicolas Bourgeois (SAMM), Marie Cottrell (SAMM), Benjamin D\'eruelle, (LAMOP), St\'ephane Lamass\'e (LAMOP), Patrick Letr\'emy (SAMM)

PDF

TL;DR

This paper enhances Kohonen maps' robustness and visualization in factorial analysis for text mining by introducing the concept of fickle words and applying graph algorithms to improve classification and interpretability.

Contribution

It introduces the use of fickle words and graph techniques to improve Kohonen map robustness and visualization in factorial analysis for text mining.

Findings

01

Fickle words highlight key vocabulary roles.

02

Enhanced Kohonen maps improve text classification.

03

Graph algorithms aid in word classification.

Abstract

This article is an extended version of a paper presented in the WSOM'2012 conference [1]. We display a combination of factorial projections, SOM algorithm and graph techniques applied to a text mining problem. The corpus contains 8 medieval manuscripts which were used to teach arithmetic techniques to merchants. Among the techniques for Data Analysis, those used for Lexicometry (such as Factorial Analysis) highlight the discrepancies between manuscripts. The reason for this is that they focus on the deviation from the independence between words and manuscripts. Still, we also want to discover and characterize the common vocabulary among the whole corpus. Using the properties of stochastic Kohonen maps, which define neighborhood between inputs in a non-deterministic way, we highlight the words which seem to play a special role in the vocabulary. We call them fickle and use them to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.