Efficient Hierarchical Clustering for Classification and Anomaly Detection
Ishita Doshi, Sreekalyan Sajjalla, Jayesh Choudhari, Rushi Bhatt,, Anirban Dasgupta

TL;DR
This paper introduces scalable hierarchical clustering algorithms designed for real-time classification and anomaly detection in social network content, offering efficiency, theoretical guarantees, and strong empirical results.
Contribution
The paper presents novel hierarchical clustering methods optimized for large-scale, real-time classification and anomaly detection with proven theoretical properties.
Findings
Low query time and linear space complexity.
Outperforms existing classification techniques.
Theoretically guarantees clustering quality.
Abstract
We address the problem of large scale real-time classification of content posted on social networks, along with the need to rapidly identify novel spam types. Obtaining manual labels for user-generated content using editorial labeling and taxonomy development lags compared to the rate at which new content type needs to be classified. We propose a class of hierarchical clustering algorithms that can be used both for efficient and scalable real-time multiclass classification as well as in detecting new anomalies in user-generated content. Our methods have low query time, linear space usage, and come with theoretical guarantees with respect to a specific hierarchical clustering cost function (Dasgupta, 2016). We compare our solutions against a range of classification techniques and demonstrate excellent empirical performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Spam and Phishing Detection · Network Security and Intrusion Detection
