NoSQL Database Tuning through Machine Learning
Florian Eppinger, Uta St\"orl

TL;DR
This paper presents a machine learning-based approach to automatically optimize NoSQL database configurations, significantly improving throughput and reducing latency in Apache Cassandra through surrogate modeling and black-box optimization.
Contribution
It introduces a novel method using Random Forest and Gradient Boosting models to tune NoSQL databases, addressing the complexity of configuration inter-dependencies.
Findings
Up to 4% throughput improvement
Latency reductions of up to 43% (read) and 39% (write)
Feasibility demonstrated across various physical configurations
Abstract
NoSQL databases have become an important component of many big data and real-time web applications. Their distributed nature and scalability make them an ideal data storage repository for a variety of use cases. While NoSQL databases are delivered with a default ''off-the-shelf'' configuration, they offer configuration settings to adjust a database's behavior and performance to a specific use case and environment. The abundance and oftentimes imperceptible inter-dependencies of configuration settings make it difficult to optimize and performance-tune a NoSQL system. There is no one-size-fits-all configuration and therefore the workload, the physical design, and available resources need to be taken into account when optimizing the configuration of a NoSQL database. This work explores Machine Learning as a means to automatically tune a NoSQL database for optimal performance. Using Random…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Water Quality Monitoring and Analysis
