COPR -- Efficient, large-scale log storage and retrieval
Julian Reichinger, Thomas Krismayer, Jan Rellermeyer

TL;DR
This paper introduces COPR, a novel compressed probabilistic retrieval algorithm for large-scale log storage and querying, offering significant improvements in storage efficiency, false-positive reduction, and query throughput over existing methods.
Contribution
COPR provides an efficient, scalable alternative to traditional indexing structures for streaming log data, with substantial gains in storage, accuracy, and speed.
Findings
Up to 93% less storage space than state-of-the-art inverted index
Up to four orders of magnitude fewer false positives
Up to 250 times higher query throughput
Abstract
Modern, large scale monitoring systems have to process and store vast amounts of log data in near real-time. At query time the systems have to find relevant logs based on the content of the log message using support structures that can scale to these amounts of data while still being efficient to use. We present our novel Compressed Probabilistic Retrieval algorithm (COPR), capable of answering Multi-Set Multi-Membership-Queries, that can be used as an alternative to existing indexing structures for streamed log data. In our experiments, COPR required up to 93% less storage space than the tested state-of-the-art inverted index and had up to four orders of magnitude less false-positives than the tested state-of-the-art membership sketch. Additionally, COPR achieved up to 250 times higher query throughput than the tested inverted index and up to 240 times higher query throughput than the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Data Storage Technologies · Video Analysis and Summarization
