Non-parametric Message Important Measure: Storage Code Design and Transmission Planning for Big Data
Shanyun Liu, Rui She, Pingyi Fan, Khaled B. Letaief

TL;DR
This paper introduces a non-parametric message importance measure (NMIM) for big data, enabling efficient storage and transmission by capturing rare events and diversity, with strategies validated through simulations.
Contribution
It proposes a novel NMIM metric for big data message importance and develops encoding and transmission strategies that optimize storage and communication efficiency.
Findings
NMIM effectively characterizes rare events and diversity in big data.
Proposed encoding reduces storage space while preserving message importance.
Transmission strategies exhibit growth and saturation regions for optimal data transfer.
Abstract
Storage and transmission in big data are discussed in this paper, where message importance is taken into account. Similar to Shannon Entropy and Renyi Entropy, we define non-parametric message important measure (NMIM) as a measure for the message importance in the scenario of big data, which can characterize the uncertainty of random events. It is proved that the proposed NMIM can sufficiently describe two key characters of big data: rare events finding and large diversities of events. Based on NMIM, we first propose an effective compressed encoding mode for data storage, and then discuss the channel transmission over some typical channel models. Numerical simulation results show that using our proposed strategy occupies less storage space without losing too much message importance, and there are growth region and saturation region for the maximum transmission, which contributes to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Caching and Content Delivery · Optimization and Search Problems
