Optimised Support Vector Regression for California Housing Price Prediction: The Critical Role of Feature Engineering and Hyperparameter Tuning

Emmanuel Adutwum

arXiv:2605.08660·cs.LG·May 12, 2026

Optimised Support Vector Regression for California Housing Price Prediction: The Critical Role of Feature Engineering and Hyperparameter Tuning

Emmanuel Adutwum

PDF

TL;DR

This study demonstrates that proper feature engineering and hyperparameter tuning significantly improve Support Vector Regression performance on the California Housing dataset, challenging prior assumptions of its weakness.

Contribution

It introduces a structured workflow combining feature engineering and hyperparameter tuning that substantially enhances SVR performance, achieving a 20% relative gain in R2.

Findings

01

Tuned SVR achieves R2 of 0.723, outperforming previous results.

02

Feature scaling alone accounts for most of the performance gain.

03

Systematic hyperparameter tuning provides incremental improvements.

Abstract

In the recent literature, Support Vector Regression (SVR) has been cited as one of the weakest performers on the California Housing benchmark dataset, with Preethi et al. (2025)specifically ranking it last among the algorithms they tested, reporting an R2 of only 0.60. This paper examines whether the previously reported performance reflects experimental configuration choices rather than an inherent algorithmic limitation. A structured experimental workflow is applied: ten domain-motivated derived features are constructed from the eight raw inputs, an exploratory ensemble feature importance analysis identifies the most predictive candidates, and a randomised search over hyperparameter combinations with three-fold cross-validation selects the optimal SVR configuration within a leakage-safe scikit-learn Pipeline. A formal four-stage ablation study isolates the contribution of each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.