# Prediction of SCA Scores in Specialty Coffee Using Machine Learning

**Authors:** Gabriel Rezende Ferraz, Felipe André Oliveira Freitas, Harim H. Baldi, Natally Ferreira Lima, Gabriela Maria Rodrigues

PMC · DOI: 10.1111/1750-3841.70946 · Journal of Food Science · 2026-02-28

## TL;DR

This study uses machine learning to predict coffee quality scores based on production data, aiming to reduce the need for time-consuming sensory evaluations.

## Contribution

The novel contribution is developing efficient predictive models for SCA scores using processing variables in specialty coffee production.

## Key findings

- Random Forest models using all variables achieved the best performance (MAE = 0.80; RMSE = 1.03; R2 = 0.53).
- Models with only seven predictors achieved nearly equivalent results (MAE = 0.81; RMSE = 1.06; R2 = 0.50).
- Variable selection proved more efficient and robust than PCA for predicting SCA scores.

## Abstract

Coffee is a major global commodity, with specialty coffees valued for their quality, assessed through standardized sensory protocols. The SCA (Specialty Coffee Association) score is a key indicator of commercial value, but sensory evaluation is resource‐intensive and subject to variability. This study developed predictive models to estimate SCA scores from processing and production‐related variables collected between 2019 and 2023, covering reception, fermentation, pulping, washing, drying, storage, and contextual production information. Random Forest (RF) and XGBoost (XGB) regression algorithms were applied using three approaches: complete variable set, Principal Component Analysis (PCA), and selection of the seven most relevant variables. The RF model with all variables achieved the best performance (MAE = 0.80; RMSE = 1.03; R2 = 0.53). However, models using only seven predictors achieved nearly equivalent results (MAE = 0.81; RMSE = 1.06; R2 = 0.50), with RF and XGB showing RMSE around 1.05 and R2 above 0.50. PCA‐based models performed worse. In conclusion, variable selection proved more efficient and robust than PCA, enabling moderate but practically relevant prediction of SCA scores with reduced model complexity in specialty coffee production.

This research shows that machine learning models can help predict coffee quality scores using processing data. Such tools may support producers and cooperatives in monitoring quality earlier and more efficiently, reducing reliance on extensive sensory tests and improving decision‐making in specialty coffee production.

## Full-text entities

- **Genes:** PCSK5 (proprotein convertase subtilisin/kexin type 5) [NCBI Gene 5125] {aka PC5, PC6, PC6A, SPC6}, PCSK7 (proprotein convertase subtilisin/kexin type 7) [NCBI Gene 9159] {aka LPC, PC7, PC8, SPC7}
- **Diseases:** SCA (MESH:D018886)
- **Chemicals:** sugar (MESH:D000073893), essential oils (MESH:D009822), amostra (-), peso (MESH:D005277)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12949623/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12949623/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/PMC12949623/full.md

---
Source: https://tomesphere.com/paper/PMC12949623