Coresets for Vector Summarization with Applications to Network Graphs
Dan Feldman, Sedat Ozer, Daniela Rus

TL;DR
This paper introduces a deterministic coreset algorithm for vector summarization that approximates the mean of large vector sets efficiently, with applications in network graph analysis and user activity summarization.
Contribution
The paper presents a novel deterministic algorithm for constructing small, weighted subsets (coresets) that approximate the mean of high-dimensional vectors, independent of data size and dimension.
Findings
The algorithm achieves an approximation error proportional to the variance of the data.
It maintains an approximate sum of vectors in streaming settings with memory independent of dimension.
Effective in identifying heavy hitters and summarizing user activity in large datasets.
Abstract
We provide a deterministic data summarization algorithm that approximates the mean of a set of vectors in , by a weighted mean of a \emph{subset} of vectors, i.e., independent of both and . We prove that the squared Euclidean distance between and is at most multiplied by the variance of . We use this algorithm to maintain an approximated sum of vectors from an unbounded stream, using memory that is independent of , and logarithmic in the vectors seen so far. Our main application is to extract and represent in a compact way friend groups and activity summaries of users from underlying data exchanges. For example, in the case of mobile networks, we can use GPS traces to identify meetings, in the case of social networks, we can use information exchange to identify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Data Management and Algorithms · Graph Theory and Algorithms
