Optimised Support Vector Regression for California Housing Price Prediction: The Critical Role of Feature Engineering and Hyperparameter Tuning
Emmanuel Adutwum

TL;DR
This study demonstrates that proper feature engineering and hyperparameter tuning significantly improve Support Vector Regression performance on the California Housing dataset, challenging prior assumptions of its weakness.
Contribution
It introduces a structured workflow combining feature engineering and hyperparameter tuning that substantially enhances SVR performance, achieving a 20% relative gain in R2.
Findings
Tuned SVR achieves R2 of 0.723, outperforming previous results.
Feature scaling alone accounts for most of the performance gain.
Systematic hyperparameter tuning provides incremental improvements.
Abstract
In the recent literature, Support Vector Regression (SVR) has been cited as one of the weakest performers on the California Housing benchmark dataset, with Preethi et al. (2025)specifically ranking it last among the algorithms they tested, reporting an R2 of only 0.60. This paper examines whether the previously reported performance reflects experimental configuration choices rather than an inherent algorithmic limitation. A structured experimental workflow is applied: ten domain-motivated derived features are constructed from the eight raw inputs, an exploratory ensemble feature importance analysis identifies the most predictive candidates, and a randomised search over hyperparameter combinations with three-fold cross-validation selects the optimal SVR configuration within a leakage-safe scikit-learn Pipeline. A formal four-stage ablation study isolates the contribution of each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
