Document stream clustering: experimenting an incremental algorithm and AR-based tools for highlighting dynamic trends
Alain Lelu (LASELDI), Martine Cadot, Pascal Cuxac (INIST)

TL;DR
This paper presents an incremental clustering algorithm and AR-based tools to analyze and visualize dynamic trends in large document streams, addressing stability and cognitive challenges in data mining.
Contribution
It introduces a stable, order-independent density-based clustering method and a rule selection process for understanding data stream dynamics.
Findings
Successfully applied to a 2-year, 2600-document scientific database
Demonstrated stable clustering over time without initial conditions
Enabled visualization of evolving trends and relationships
Abstract
We address here two major challenges presented by dynamic data mining: 1) the stability challenge: we have implemented a rigorous incremental density-based clustering algorithm, independent from any initial conditions and ordering of the data-vectors stream, 2) the cognitive challenge: we have implemented a stringent selection process of association rules between clusters at time t-1 and time t for directly generating the main conclusions about the dynamics of a data-stream. We illustrate these points with an application to a two years and 2600 documents scientific information database.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Time Series Analysis and Forecasting · Data Stream Mining Techniques
