Identifying Optimal Regression Models For DEM Simulation Datasets
B. D. Jenkins, A. L. Nicusan, A. Neveu, G. Lumay, F. Francqui, J. P. K. Seville, C. R. K. Windows-Yule

TL;DR
This paper evaluates various regression models for DEM datasets, proposing a framework to identify the most effective model, with a case study showing histogram-based gradient boosting as optimal for predicting packing fractions.
Contribution
It introduces a simple framework using k-fold cross-validation to select the best regression model for DEM data, demonstrated through a practical example.
Findings
Histogram-based gradient boosting outperformed other models.
The proposed framework effectively identifies optimal regression models.
The selected model balances accuracy and computational efficiency.
Abstract
Developing fast regression models (surrogate/metamodels) from DEM data is key for practical industrial application to allow real-time evaluations. However, benchmarking different models is often overlooked in particle technology for regression tasks, as model selection is frequently not the primary research focus. This can lead to the use of suboptimal models, resulting in subpar predictive accuracy, slow evaluations, or poor generalisation, hindering effective real-time decision-making and process optimisation. In this work, we discuss applying k-fold cross-validation to assess regression models for tabular DEM datasets and propose a simple framework for readers to follow to find the optimal model for their data. An example demonstrates its application to a DEM dataset of packing fractions measured in a simple measuring beaker with varying inter-particle properties, namely, average…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
