A Flexible Adaptive Stable Clustering Algorithm for Archive-Scale Online Mass Spectrometry
Shao Shi, Xin Yang, Huiran Feng, Jianhuai Ye, Tianlong Hu, Yaling Zeng, Tzung-May Fu, Lei Zhu, Huizhong Shen, Chen Wang, Shu Tao

TL;DR
The paper introduces FASC, a scalable, stable clustering algorithm for large-scale mass spectrometry data, enabling detailed environmental analysis with linear runtime and high accuracy.
Contribution
FASC decouples similarity measures from optimization, ensuring deterministic convergence and scalability for massive environmental mass spectrometry datasets.
Findings
Achieved >99.5% cluster purity and 0.99 Adjusted Rand Index on ground truth data.
Demonstrated linear runtime scaling (O(N)) on 25 million spectra.
Successfully mapped atmospheric aerosol aging pathways and identified rare industrial tracers.
Abstract
Modern online mass spectrometry generates multi-terabyte data streams critical for understanding Earth's environmental systems. However, extracting actionable chemical insights from these repositories is impeded by a computational bottleneck: existing clustering methods force a compromise among scalability, metric flexibility, and algorithmic stability. Here, we introduce Flexible Adaptive Stable Clustering (FASC), a dynamical systems framework that resolves these constraints by architecturally decoupling the similarity kernel from rigorous optimization logic. Unlike legacy heuristics that suffer from stochastic drift and algorithmic blending, FASC employs a Density-Augmented Similarity Selection rule and geometric constraints to guarantee deterministic, order-independent convergence. After validating FASC on canonical machine-learning ground truths (achieving >99.5% cluster purity and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
