# Optimized Preprocessing and Machine Learning for Quantitative Raman   Spectroscopy in Biology

**Authors:** Emily E Storey, Amr S. Helmy

arXiv: 1904.02243 · 2019-04-05

## TL;DR

This paper introduces a statistical method to select optimal pre-processing techniques for Raman spectroscopy in biological samples, enhancing model robustness and reducing user-dependent variability in biofluid analysis.

## Contribution

The study presents a novel statistical approach to evaluate spectral variability, enabling automatic selection of pre-processing methods and optimal component number for more reliable Raman-based diagnostics.

## Key findings

- Improved predictive accuracy in biological fluid analysis.
- Reduced operator dependency in spectral pre-processing.
- Enhanced robustness of Raman models against biological variability.

## Abstract

Raman spectroscopy's capability to provide meaningful composition predictions is heavily reliant on a pre-processing step to remove insignificant spectral variation. This is crucial in biofluid analysis. Widespread adoption of diagnostics using Raman requires a robust model which can withstand routine spectra discrepancies due to unavoidable variations such as age, diet, and medical background. A wealth of pre-processing methods are available, and it is often up to trial-and-error or user experience to select the method which gives the best results. This process can be incredibly time consuming and inconsistent for multiple operators.   In this study we detail a method to analyze the statistical variability within a set of training spectra and determine suitability to form a robust model. This allows us to selectively qualify or exclude a pre-processing method, predetermine robustness, and simultaneously identify the number of components which will form the best predictive model. We demonstrate the ability of this technique to improve predictive models of two artificial biological fluids.   Raman spectroscopy is ideal for noninvasive, nondestructive analysis. Routine health monitoring which maximizes comfort is increasingly crucial, particularly in epidemic-level diabetes diagnoses. High variability in spectra of biological samples can hinder Raman's adoption for these methods. Our technique allows the decision of optimal pre-treatment method to be determined for the operator; model performance is no longer a function of user experience. We foresee this statistical technique being an instrumental element to widening the adoption of Raman as a monitoring tool in a field of biofluid analysis.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.02243/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1904.02243/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1904.02243/full.md

---
Source: https://tomesphere.com/paper/1904.02243