Efficient algorithms for collecting the statistics of large-scale IP address data
Hui Liu, Yi Cao, Zehan Cai, Hua Mao, and Jie Chen

TL;DR
This paper introduces two efficient algorithms for large-scale IP address statistics collection, optimizing time and memory use by leveraging the sparse nature of IP data and dynamic hash indexing, with a parallel scheme for faster processing.
Contribution
The paper presents novel algorithms that avoid hash collisions and adapt hash index length, significantly improving efficiency in large-scale IP data analysis.
Findings
Outperforms baseline methods in time efficiency
Uses less memory than existing solutions
Supports parallel processing for faster computation
Abstract
Compiling the statistics of large-scale IP address data is an essential task in network traffic measurement. The statistical results are used to evaluate the potential impact of user behaviors on network traffic. This requires algorithms that are capable of storing and retrieving a high volume of IP addresses within time and memory constraints. In this paper, we present two efficient algorithms for collecting the statistics of large-scale IP addresses that balance time efficiency and memory consumption. The proposed solutions take into account the sparse nature of the statistics of IP addresses while building the hash function and maintain a dynamic balance among layered memory blocks. There are two layers in the first proposed method, each of which contains a limited number of memory blocks. Each memory block contains 256 elements of size bytes for a 64-bit system. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Packet Processing and Optimization · Network Security and Intrusion Detection · Algorithms and Data Compression
