Delta Tensor: Efficient Vector and Tensor Storage in Delta Lake

Zhiwei Bao; Liu Liao-Liao; Zhiyu Wu; Yifan Zhou; Dan Fan; Michal; Aibin; Yvonne Coady; Andrew Brownsword

arXiv:2405.03708·cs.DC·May 14, 2024

Delta Tensor: Efficient Vector and Tensor Storage in Delta Lake

Zhiwei Bao, Liu Liao-Liao, Zhiyu Wu, Yifan Zhou, Dan Fan, Michal, Aibin, Yvonne Coady, Andrew Brownsword

PDF

Open Access

TL;DR

This paper introduces Delta Tensor, a new method for storing vectors and tensors efficiently in Delta Lake, significantly improving space and time performance for AI and ML applications in cloud environments.

Contribution

It adapts array database strategies and sparse encoding to Delta Lake, enabling efficient tensor storage in a Lakehouse architecture.

Findings

01

Notable improvements in space efficiency

02

Enhanced time performance over traditional serialization

03

Effective for AI and ML data management

Abstract

The exponential growth of artificial intelligence (AI) and machine learning (ML) applications has necessitated the development of efficient storage solutions for vector and tensor data. This paper presents a novel approach for tensor storage in a Lakehouse architecture using Delta Lake. By adopting the multidimensional array storage strategy from array databases and sparse encoding methods to Delta Lake tables, experiments show that this approach has demonstrated notable improvements in both space and time efficiencies when compared to traditional serialization of tensors. These results provide valuable insights for the development and implementation of optimized vector and tensor storage solutions in data-intensive applications, contributing to the evolution of efficient data management practices in AI and ML domains in cloud-native environments

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Physics and Python Applications · Seismic Imaging and Inversion Techniques · Tensor decomposition and applications