Algorithms for Massive Data -- Lecture Notes
Nicola Prezza

TL;DR
This lecture notes overview algorithmic techniques for processing massive data sets that exceed memory capacity, focusing on compressed data structures and data sketches to enable efficient analysis.
Contribution
It provides a comprehensive introduction to both lossless and lossy methods like compressed suffix arrays, sketches, and hashing for large-scale data processing.
Findings
Discusses compressed suffix arrays and probabilistic filters.
Explores sketching techniques under various metrics.
Covers algorithms for streams and nearest neighbor search.
Abstract
These are the lecture notes for the course CM0622 - Algorithms for Massive Data, Ca' Foscari University of Venice. The goal of this course is to introduce algorithmic techniques for dealing with massive data: data so large that it does not fit in the computer's memory. There are two main solutions to deal with massive data: (lossless) compressed data structures and (lossy) data sketches. These notes cover both topics: compressed suffix arrays, probabilistic filters, sketching under various metrics, Locality Sensitive Hashing, nearest neighbour search, algorithms on streams.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Educational Technology and Assessment · Data Mining Algorithms and Applications
