Detecting Group Anomalies in Tera-Scale Multi-Aspect Data via Dense-Subtensor Mining
Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos

TL;DR
This paper introduces D-CUBE, a disk-based and distributed method for detecting dense subtensors in large-scale multi-aspect data, significantly improving memory efficiency, speed, and accuracy over existing approaches.
Contribution
The paper presents D-CUBE, a novel scalable and memory-efficient algorithm for dense-subtensor detection in massive tensors, with provable accuracy and practical effectiveness.
Findings
Handles 1,000X larger data than previous methods
Up to 7X faster processing speed
Achieves high accuracy in detecting network attacks and synchronized behaviors
Abstract
How can we detect fraudulent lockstep behavior in large-scale multi-aspect data (i.e., tensors)? Can we detect it when data are too large to fit in memory or even on a disk? Past studies have shown that dense subtensors in real-world tensors (e.g., social media, Wikipedia, TCP dumps, etc.) signal anomalous or fraudulent behavior such as retweet boosting, bot activities, and network attacks. Thus, various approaches, including tensor decomposition and search, have been proposed for detecting dense subtensors rapidly and accurately. However, existing methods have low accuracy, or they assume that tensors are small enough to fit in main memory, which is unrealistic in many real-world applications such as social media and web. To overcome these limitations, we propose D-CUBE, a disk-based dense-subtensor detection method, which also can run in a distributed manner across multiple machines.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Network Security and Intrusion Detection · Internet Traffic Analysis and Secure E-voting
