Wine Quality Prediction with Ensemble Trees: A Unified, Leak-Free Comparative Study
Zilang Chen

TL;DR
This study benchmarks five ensemble learning methods for wine quality prediction using a rigorous, leak-free workflow on Vinho Verde datasets, providing insights into model performance, feature importance, and computational efficiency.
Contribution
It introduces a comprehensive, reproducible benchmarking pipeline for ensemble models on wine quality data, highlighting the most effective models and feature subsets.
Findings
Gradient Boosting achieves highest accuracy with weighted F1 around 0.69.
Top five features capture most predictive information, reducing dimensionality by 55%.
Random Forest offers the best cost-efficiency for production use.
Abstract
Accurate and reproducible wine-quality assessment is critical for production control yet remains dominated by subjective, labour-intensive tasting panels. We present the first unified benchmark of five ensemble learners (Random Forest, Gradient Boosting, XGBoost, LightGBM, CatBoost) on the canonical Vinho Verde red- and white-wine datasets (1,599 and 4,898 instances, 11 physicochemical attributes). Our leakage-free workflow employs an 80:20 stratified train-test split, five-fold StratifiedGroupKFold within the training set, per-fold standardisation, SMOTE-Tomek resampling, inverse-frequency cost weighting, Optuna hyper-parameter search (120-200 trials per model) and a two-stage feature-selection refit. Final scores on untouched test sets are reported with weighted F1 as the headline metric. Gradient Boosting achieves the highest accuracy (weighted F1 0.693 +/- 0.028 for red and 0.664…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFermentation and Sensory Analysis · Wine Industry and Tourism · Horticultural and Viticultural Research
