Benchmarking of a new data splitting method on volcanic eruption data
Simona Reale, Pietro Di Stasio, Francesco Mauro, Alessandro, Sebastianelli, Paolo Gamba, Silvia Liberata Ullo

TL;DR
This paper introduces a new data splitting method using a dissimilarity index for volcanic eruption data, leading to improved model learning and performance compared to traditional methods.
Contribution
The paper presents the Cumulative Histogram Dissimilarity (CHD) index and an iterative data splitting procedure, demonstrating its effectiveness over random and K-means splitting methods.
Findings
The proposed splitting method outperforms random and K-means splits in model performance.
Models trained with the new split show deeper learning and better generalization.
Early stopping indicates optimal learning with the new splitting method.
Abstract
In this paper, a novel method for data splitting is presented: an iterative procedure divides the input dataset of volcanic eruption, chosen as the proposed use case, into two parts using a dissimilarity index calculated on the cumulative histograms of these two parts. The Cumulative Histogram Dissimilarity (CHD) index is introduced as part of the design. Based on the obtained results the proposed model in this case, compared to both Random splitting and K-means implemented over different configurations, achieves the best performance, with a slightly higher number of epochs. However, this demonstrates that the model can learn more deeply from the input dataset, which is attributable to the quality of the splitting. In fact, each model was trained with early stopping, suitable in case of overfitting, and the higher number of epochs in the proposed method demonstrates that early stopping…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Geological Modeling and Analysis
MethodsEarly Stopping
