Adaptive Normalization in Streaming Data
Vibhuti Gupta, Rattikorn Hewett

TL;DR
This paper introduces a distributed, adaptive normalization method for streaming Big Data that dynamically adjusts to data changes using sliding windows, improving normalization accuracy and efficiency in real-time processing frameworks.
Contribution
It presents a novel distributed adaptive normalization technique for Big Data streams that operates without being tailored to specific tasks and offers adjustable tradeoffs between speed and accuracy.
Findings
Achieved 89% improvement over baseline in normalization accuracy.
Normalized 160,000 data instances with low RMS error of 0.0041.
Implemented on Apache Storm for real-time, scalable data processing.
Abstract
In todays digital era, data are everywhere from Internet of Things to health care or financial applications. This leads to potentially unbounded ever-growing Big data streams and it needs to be utilized effectively. Data normalization is an important preprocessing technique for data analytics. It helps prevent mismodeling and reduce the complexity inherent in the data especially for data integrated from multiple sources and contexts. Normalization of Big Data stream is challenging because of evolving inconsistencies, time and memory constraints, and non-availability of whole data beforehand. This paper proposes a distributed approach to adaptive normalization for Big data stream. Using sliding windows of fixed size, it provides a simple mechanism to adapt the statistics for normalizing changing data in each window. Implemented on Apache Storm, a distributed real-time stream data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
