Beyond Standard Datacubes: Extracting Features from Irregular and Branching Earth System Data
Mathilde Leuridan, James Hawkes, Tiago Quintino, Martin Schultz

TL;DR
This paper introduces a generalized tree-based data hypercube model for efficiently representing, indexing, and extracting features from complex, irregular Earth science datasets, enabling scalable and flexible data access.
Contribution
It presents a novel compressed tree structure for data hypercubes and integrates it into a feature extraction system within the Polytope framework, enhancing support for complex data spaces.
Findings
Efficient indexing of large, irregular datasets using compressed tree hypercubes.
Improved data access and feature extraction performance on complex Earth science data.
Demonstrated scalability and flexibility of the approach in practical evaluations.
Abstract
Earth science datasets are growing rapidly in both volume and structural complexity. They increasingly contain richly labelled data with heterogeneous metadata and complex internal constraints that impose dependencies between variables and dimensions. Datacubes have become a common abstraction for organising such datasets, but traditional dense and orthogonal datacube models struggle to represent irregular, sparse or branching data spaces efficiently. In this paper, we introduce a generalised data hypercube representation based on compressed tree structures, which enables an accurate and compact description of complex data spaces. We describe the design of this representation and analyse its ability to capture sparsity and conditional relationships while remaining efficient to traverse. Using a concrete implementation, we study the performance characteristics of compressed tree data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Advanced Database Systems and Queries · Advanced Data Storage Technologies
