Classification dynamique d'un flux documentaire : une \'evaluation statique pr\'ealable de l'algorithme GERMEN
Alain Lelu (LASELDI), Pascal Cuxac (INIST), Joel Johansson (INIST)

TL;DR
This paper introduces the GERMEN algorithm for static evaluation of document streams, emphasizing qualitative detection of weak signals and topical evolution, with proven robustness and independence from data order.
Contribution
The paper presents a novel static evaluation method for the GERMEN clustering algorithm, focusing on qualitative signal detection and cluster stability in document streams.
Findings
Effective detection of weak signals in geotechnics data
Robustness to data presentation order and initialization
Successful static assessment over one year of data
Abstract
Data-stream clustering is an ever-expanding subdomain of knowledge extraction. Most of the past and present research effort aims at efficient scaling up for the huge data repositories. Our approach focuses on qualitative improvement, mainly for "weak signals" detection and precise tracking of topical evolutions in the framework of information watch - though scalability is intrinsically guaranteed in a possibly distributed implementation. Our GERMEN algorithm exhaustively picks up the whole set of density peaks of the data at time t, by identifying the local perturbations induced by the current document vector, such as changing cluster borders, or new/vanishing clusters. Optimality yields from the uniqueness 1) of the density landscape for any value of our zoom parameter, 2) of the cluster allocation operated by our border propagation rule. This results in a rigorous independence from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRough Sets and Fuzzy Logic
