COPR -- Efficient, large-scale log storage and retrieval

Julian Reichinger; Thomas Krismayer; Jan Rellermeyer

arXiv:2402.18355·cs.IR·March 28, 2024·2 cites

COPR -- Efficient, large-scale log storage and retrieval

Julian Reichinger, Thomas Krismayer, Jan Rellermeyer

PDF

Open Access

TL;DR

This paper introduces COPR, a novel compressed probabilistic retrieval algorithm for large-scale log storage and querying, offering significant improvements in storage efficiency, false-positive reduction, and query throughput over existing methods.

Contribution

COPR provides an efficient, scalable alternative to traditional indexing structures for streaming log data, with substantial gains in storage, accuracy, and speed.

Findings

01

Up to 93% less storage space than state-of-the-art inverted index

02

Up to four orders of magnitude fewer false positives

03

Up to 250 times higher query throughput

Abstract

Modern, large scale monitoring systems have to process and store vast amounts of log data in near real-time. At query time the systems have to find relevant logs based on the content of the log message using support structures that can scale to these amounts of data while still being efficient to use. We present our novel Compressed Probabilistic Retrieval algorithm (COPR), capable of answering Multi-Set Multi-Membership-Queries, that can be used as an alternative to existing indexing structures for streamed log data. In our experiments, COPR required up to 93% less storage space than the tested state-of-the-art inverted index and had up to four orders of magnitude less false-positives than the tested state-of-the-art membership sketch. Additionally, COPR achieved up to 250 times higher query throughput than the tested inverted index and up to 240 times higher query throughput than the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Advanced Data Storage Technologies · Video Analysis and Summarization