Improving predictions by nonlinear regression models from outlying input data
William W. Hsieh

TL;DR
This study shows that nonlinear regression models perform poorly on outlying environmental data, but a hybrid approach using linear extrapolation improves prediction reliability for outliers.
Contribution
The paper introduces NLR$_{ ext{OR}}$, a method combining nonlinear and linear extrapolation to enhance prediction accuracy on outlier data in environmental sciences.
Findings
NLR outperforms LR on non-outliers.
NLR underperforms LR on outliers.
NLR$_{ ext{OR}}$ reduces poor extrapolations and improves outlier predictions.
Abstract
When applying machine learning/statistical methods to the environmental sciences, nonlinear regression (NLR) models often perform only slightly better and occasionally worse than linear regression (LR). The proposed reason for this conundrum is that NLR models can give predictions much worse than LR when given input data which lie outside the domain used in model training. Continuous unbounded variables are widely used in environmental sciences, whence not uncommon for new input data to lie far outside the training domain. For six environmental datasets, inputs in the test data were classified as "outliers" and "non-outliers" based on the Mahalanobis distance from the training input data. The prediction scores (mean absolute error, Spearman correlation) showed NLR to outperform LR for the non-outliers, but often underperform LR for the outliers. An approach based on Occam's Razor (OR)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHydrological Forecasting Using AI · Advanced Statistical Methods and Models · Data Analysis with R
MethodsLinear Regression
