Interactive (statistical) visualisation and exploration of a billion objects with Vaex
Maarten A. Breddels

TL;DR
This paper introduces Vaex, a Python library enabling rapid, interactive visualization of billion-object datasets by using grid-based statistical summaries, overcoming traditional rendering limitations.
Contribution
The paper presents a novel approach using grid-based statistics and memory mapping to visualize billion-object datasets interactively within seconds.
Findings
Visualization of over a billion objects achieved within a second.
Efficient handling of large datasets using memory mapping and binning.
Vaex integrates with existing scientific Python tools for seamless analysis.
Abstract
With new catalogues arriving such as the Gaia DR1, containing more than a billion objects, new methods of handling and visualizing these data volumes are needed. In visualization, one problem is that the number of datapoints can become so large, that a scatter plot becomes cluttered. Another problem is that with over a billion objects, only a few cpu cycles are available per object if one wants to process them within a second, making traditional methods by rendering glyphs not viable. Instead, we show that by calculating statistics on a regular (N-dimensional) grid, visualizations of a billion objects can be done within a second on a modern desktop computer. This is achieved using memory mapping of hdf5 files together with a simple binning algorithm, which are part of a Python library called vaex. This enables efficient exploration or large datasets interactively, making science…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
