Investigating Spectral Biomarker Candidates for Migratory Potential in Cancer Cells Using Micro-FTIR and O‑PTIR Spectroscopy
Elisabeth Holub, Nikolaus Hondl, Kai-Lan Lin, Marjaana Parikainen, Cecilia Sahlgren, Bernhard Lendl, Georg Ramer

TL;DR
This paper explores new methods using infrared spectroscopy to identify cancer cell migration traits without traditional staining techniques.
Contribution
The study introduces O-PTIR spectroscopy as a novel method for label-free cancer cell analysis with improved spatial resolution.
Findings
O-PTIR spectroscopy shows potential for identifying IR tumor markers with better spatial resolution than FTIR.
Machine learning models were developed to classify cells based on spectral features related to migratory properties.
The custom-built O-PTIR instrument outperformed commercial FTIR in microfluidic channel measurements.
Abstract
Routine diagnostic practice for cancer and metastasis relies on a time-consuming staining process and the use of antibodies to detect selected molecular markers and is hence limited by a lack of real-time data and the availability of molecular information. Against this background, techniques based on rapid chemical analysis to identify migratory properties are highly desirable. Fourier-Transform Infrared (FTIR) microspectroscopy has a long history in the label-free identification of infrared marker bands for cancer detection. However, it requires extensive postprocessing of the acquired spectra, is of limited suitability for analysis in aqueous environments, and has poor spatial resolution. To overcome these challenges, we are using a new method termed Optical Photothermal Infrared (O-PTIR) spectroscopy to detect local absorption to establish potential IR tumor markers and…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
1
2
3
4
5
6
7|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| 1210 | 1311 | 491 | 0.00037 | 175 | 521 | 0.00205 |
| 1229–1231 | 1061 | 741 | 0.16529 | 170 | 526 | 0.00152 |
| 1236 | 1004 | 798 | 0.37258 | 163 | 533 | 0.00098 |
| 1242–1243 | 943 | 859 | 0.71809 | 156 | 540 | 0.00062 |
| 1456–1458 | 1359 | 443 | 6.9e-05 | 403 | 293 | 0.33014 |
| 1541 | 773 | 1029 | 0.26737 | 378 | 318 | 0.59811 |
| 1545–1548 | 702 | 1100 | 0.08421 | 396 | 300 | 0.39602 |
| 1592–1595 | 935 | 867 | 0.77073 | 232 | 464 | 0.03904 |
| 1645–1648 | 726 | 1076 | 0.12902 | 265 | 431 | 0.14044 |
| 1725–1729 | 1051 | 751 | 0.19343 | 464 | 232 | 0.03904 |
| 1748–1749 | 1517 | 285 | 8.6e-08 | 304 | 392 | 0.43699 |
- —?sterreichische Nationalstiftung f?r Forschung, Technologie und Entwicklung10.13039/100010132
- —Horizon 2020 Framework Programme10.13039/100010661
- —Bundesministerium f?r Arbeit und Wirtschaft10.13039/100031306
- —Research Council of Finland10.13039/501100002341
- —Research Council of Finland10.13039/501100002341
- —Research Council of Finland10.13039/501100002341
- —Research Council of Finland10.13039/501100002341
- —Suomen Kulttuurirahasto10.13039/501100003125
- —Jane ja Aatos Erkon S??ti?10.13039/501100004012
- —Christian Doppler Forschungsgesellschaft10.13039/501100006012
- —Sigrid Jus?liuksen S??ti?10.13039/501100006306
- —Svenska Kulturfonden10.13039/501100007247
- —Sy?p?s??ti?10.13039/501100010711
- —Technische Universit?t Wien Bibliothek10.13039/501100012650
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpectroscopy Techniques in Biomedical and Chemical Research · Cancer Research and Treatment · thermodynamics and calorimetric analyses
Introduction
1
Current diagnosis of cancer and metastasis is based on the qualitative examination of stained tissue sections by a pathologist, which involves a time-consuming staining process and cannot provide real-time information for the intraoperative assessment of tumors. ?,? Hence, metastatic sites are detected by imaging tests only after the cancer has spread, while little attempts are made in identifying high-risk patients before a metastatic event.
Moreover, since traditional histology lacks the ability to capture molecular data, treatment decisions are usually guided by the analysis of spatial patterns in stained histological sections, supported by immunohistochemistry, i.e., the use of antibodies to detect the presence and location of certain molecular markers.? However, only a limited number of such markers can be detected simultaneously, making the characterization of a multifactorial disease such as cancer a challenging task.? Dye-free chemical profiling of aggressive and nonaggressive cancer cells and tissues can therefore provide valuable information for risk-adapted monitoring and treatment of cancer.
A noninvasive, label-free and biologically specific method, Fourier-Transform Infrared (FTIR) microspectroscopy has shown great diagnostic potential in disease and cancer detection. The application of the FTIR technique to histological samples gave rise to the term “spectral histopathology” ?,? and includes efforts to distinguish between healthy and tumor cells, ?,?,?,? the establishment of cancer markers, ?,? and the spectral analysis of several stages of metastasis.?
However, the optical diffraction limit restricts the (lateral) spatial resolution of infrared (IR) microscopy to several microns. Furthermore, the strong absorption of infrared radiation by water prevents the routine use of IR spectroscopy in wet tissues or aqueous media. Despite recent technological advances in mid-IR-laser-based microscopes and imaging systems, which allow for significantly faster analysis, IR microscopy is still an absorption-based technique, and measurement in aqueous samples remains challenging.
More recently, a novel method termed Optical Photothermal Infrared (O-PTIR) spectroscopy has broken new ground in overcoming the above-mentioned challenges, achieving submicrometer lateral resolution while at the same time providing the same spectral information as probed in FTIR (micro)spectroscopy. ?,?
O-PTIR is a label-free pump–probe technique that combines the biological specificity of IR spectroscopy and the spatial resolution of optical microscopy by using an IR pump and a visible probe. The absorption of the (focused) IR laser induces a local change in the refractive index of the sample. This local refractive-index perturbation acts as a lens and modifies the diameter of the transmitted probe beam. When an aperture is placed before the detector, the power transmitted through the aperture varies with the probe beam diameter, which in turn depends on the strength of the thermal lens and thus the absorbed IR power.
When the IR beam is modulated, the change in detected probe laser power has the same frequency as the pump laser repetition rate and can be extracted from the transmission signal using a lock-in amplifier, referenced to the laser repetition rate. High-frequency modulation of the IR beam and demodulation of the visible probe make it possible to detect minute refractive-index changes against a strong background. The visible laser spot can be focused on a very small area of interest, further reducing the contribution from surrounding material and giving the technique great potential to detect small variations in local chemical structure. With its visible-light lateral resolution, O-PTIR spectroscopy is better suited for studying sample differences at the subcellular level than traditional IR spectroscopic modalities.
Moreover, the use of a visible probe bypasses the strong absorption of infrared radiation by water, allowing infrared spectra to be acquired even in aqueous environments.? This gives the O-PTIR technique a decisive advantage over FTIR microspectroscopy and makes it more suitable for the analysis of hydrated tissues and live cells.
In the present study, our aim was to demonstrate that spectra recorded with our home-built O-PTIR instrument exhibit the same qualitative features as spectra obtained with a commercial FTIR microspectrometer. We then established a classification model to distinguish between aggressive and nonaggressive cancer cells, comparing FTIR and O-PTIR spectra. The primary goal of this work was to verify that migratory cells could be detected in O-PTIR spectra and differentiated from nonmigratory cells using only a limited subset of spectral bands.
A highly aggressive breast cancer cell line was modified to obtain a migratory and a nonmigratory cell type. Two primary endothelial cell lines served as healthy controls. Multiple spectra of all cell samples were collected with a commercial FTIR microspectrometer and a custom confocal O-PTIR instrument (Figure S1). The O-PTIR microscope was designed in transmission mode, making it a versatile tool for measurement in liquid environments and on-chip applications.
Training and test sets were randomly generated for each of the FTIR and O-PTIR data sets for variable selection by the Least Absolute Shrinkage and Selection Operator (LASSO) and classification by Linear Discriminant Analysis (LDA). Past research indicates that supervised machine learning approaches perform better in terms of accuracy when the number of classes is small, and are more robust to imbalanced data.? In this study, there were only 3 cell labels, which were known, making supervised algorithms the natural choice.
LDA is a well-known supervised learning technique that solves multiclass classification problems by modeling the data distribution for each class. The goal of maximizing the separation between different classes is met by finding a linear combination of features that maximizes the ratio of between-class variance to within-class variance. ?−? ? Due to its straightforward concept and wide applicability, LDA has been used for decades and is readily available in software libraries.
A major drawback of LDA is that it cannot be applied when the number of variables is much larger than the number of observations, which is a common problem in spectral analysis. In this case, the within-class covariance matrix of the features is singular, resulting in instability and overfitting. ?,? This limitation can be bypassed by reducing the number of input variables via regularization, dimension reduction or variable preselection. ?,? Regularization directly modifies the objective function of the classifier,? making it difficult to pinpoint which features are most influential. Instead of directly penalizing the discriminant function, dimension reduction techniques generate new variables from the original ones to preserve as much information as possible.? Again, this approach masks the importance of individual variables and decreases the interpretability of the model. Conversely, variable selection methods identify the most important features while discarding less relevant ones, thus simplifying the model and improving its interpretability.
For this reason, a Least Absolute Shrinkage and Selection Operator (LASSO) was employed prior to classification to select the most relevant features for each data set, reducing the spectral data to only a few wavenumbers. Ridge regression was also tested but resulted in a much larger subset of wavenumbers for the desired classification accuracy, making it difficult to identify distinct spectral marker candidates.
The selected wavenumber subset was subjected to histogram and statistical analysis to understand the importance of the selected wavenumbers for each cell class. Final classification was carried out using a linear discriminant analysis (LDA) model with multifold cross-validation.
The wavenumber ranges output by the LASSO (Tables S2 and S3) correspond to IR cancer markers previously suggested in the literature, such as the symmetric and asymmetric stretching modes of the phosphate bond, ?,?,? the Amide I? and Amide II ?,? protein bands, CH_2_ bending and asymmetric CH_3_ bending of proteins and lipids,? and the CO stretching of esterified lipids.?
In the present analysis, the phosphate and esterified lipid bands were most strongly related to differences between the migratory and nonmigratory cell groups. The migratory and nonmigratory classes could be distinguished in both data sets even though the class size was much lower for the O-PTIR data. These findings underscore the potential of infrared spectroscopy methods to identify subtle chemical differences in complex biological samples.
Experimental Section
2
Instruments and Measurement Parameters
2.1
FTIR microspectroscopy was performed with a commercial microscope (Hyperion 3000, Bruker, Billerica, MA, USA). The FTIR microspectrometer is equipped with a nitrogen-cooled focal plane array (FPA) detector for concurrent measurement of 64 × 64 spectra. These hyperspectral images were collected in transmission using a clean part of the substrate as background. Spectra were collected at a spectral resolution of 2 cm^–1^ and a pixel resolution of approximately 2.7 μm within the spectral range of 930 cm^–1^ to 3850 cm^–1^. The pixel resolution was determined from the detector’s field of view (170 × 170 μm^2^) and the number of pixels (64 × 64).
O-PTIR spectroscopy was performed using a custom confocal instrument described elsewhere? (Figure S1). Samples are irradiated with a tunable IR pump laser (MIRcat-QT-z, Daylight Solutions, San Diego, CA, USA) from below; a counter-propagating continuous probe laser (LBX-633, Oxxius, Lannion, France) detects the wavenumber-specific absorption in the sample. The spectral resolution, pulse width and modulation frequency settings of the IR laser were 1 cm^–1^, 500 ns and 50 kHz, respectively. The modulation frequency is a compromise between 1/f noise and the low-pass behavior of the photothermal measurement principle. The frequency of 50 kHz has been found to offer a good compromise. The spectral ranges for each of the four MIRcat-QT-z chips is reported in Table S1. The visible laser power was approximately 1 mW at the sample. In O-PTIR microscopy, lateral spatial resolution is dictated by the visible laser wavelength of 633 nm. For the custom-built instrument reported here, lateral spatial resolution was determined to be below 700 nm.?
Both instruments were flushed with dry air to remove atmospheric water vapor and thus avoid misrepresentation of the Amide I and II bands (1600 cm^–1^ to 1720 cm^–1^ and 1500 cm^–1^ to 1560 cm^–1^).?
Samples
2.2
A highly aggressive, triple-negative breast cancer cell line, MDA-MB-231 WT, was modified using CRISPR-Cas9 to obtain two cell types that would only differ in the migratory properties but were otherwise morphologically identical.? Triple-negative cells test negative for estrogen, progesterone and human epidermal growth factor receptor 2 (HER2) and are characterized by high aggressiveness and low chances of recovery.? A Jagged 1 (JAG1) knockout was performed to create a nonmigratory cell type (MDA-MB 231 KO). Two primary endothelial cell lines (HAoEC, human aortic endothelial cells, and HUVEC, human umbilical vein endothelial cells) were grown for 4 to 6 passages. The measurements of both HAoEC and HUVEC were combined to form a single “Healthy” control group.
There is ample evidence that JAG1 is a ligand for Notch receptors that plays a major role in tumor cell growth, invasion and metastasis. ?−? ? JAG1 suppression reduces migration and invasive capacity in cancer cells. ?−? ? More specifically, JAG1 knock-down has been demonstrated to impair MDA-MB-231 cell motility, while JAG1 overexpression promotes MDA-MB-231 cell proliferation.? Furthermore, JAG1 is strongly expressed by tumor vasculature? and has been associated with vascular invasion, higher tumor grade and poor outcomes in breast cancer patients. ?,?,?
Cell lines were cultured in their respective optimal growth media. MDA-MB-231 WT and MDA-MB-231 KO cells were maintained in high-glucose Dulbecco’s Modified Eagle Medium (DMEM) supplemented with 10% fetal bovine serum (FBS), 2 mmol L^–1^ glutamine, 100 units/mL penicillin, and 100 μg mL^–1^ streptomycin. HAoEC and HUVEC, along with their specific culture media, were purchased from PromoCell (Heidelberg, Germany). HAoEC were cultured in Endothelial Cell Growth Medium MV with SupplementMix, and HUVEC in Endothelial Cell Growth Medium 2 with SupplementMix. All cell cultures tested negative for mycoplasma and were maintained at 37 °C in a humidified atmosphere containing 5% CO_2_.
For spectral analysis, 50 000 cells per cell line were seeded onto CaF_2_ glass (Crystran Ltd., Poole, UK) of 13 mm diameter and 1 mm thickness and allowed to adhere overnight. The following day, cells were fixed with 4% PFA for 10 min at room temperature, then stored in PBS buffer at 4 °C.
Before analysis, PBS buffer supernatant was decanted and samples were rinsed at least 5 times with distilled water to remove excess PBS and prevent salt crystal formation. The same sample types were used for comparative analysis, but different replicates of the parent cells were used.
Collection and Processing of Spectra
2.3
Vibrational spectra are affected by noise and measurement artifacts. Noise sources include environmental fluctuations and instrument noise, while artifacts include optical artifacts, sample impurities, as well as scattering and standing-wave effects.? As these unwanted signal variations can mask spectral features and have a significant impact on the performance of quantitative models,? it is indispensable to remove them from the underlying model data. The preprocessing steps comprise the removal of outliers (e.g., bad pixel correction, removal of spectra with low signal-to-noise ratio (SNR)), baseline correction, smoothing, scattering correction, and normalization.
All postprocessing and data formatting operations were carried out using Python 3, in particular the NumPy ?,? and xarray? packages as well as the pybaselines? and scikit-learn? libraries.
FTIR
2.3.1
The FTIR instrument was set to record 64 × 64 points per image; 64 scans were averaged per pixel. In total, spectra from approximately 100 cells per cell class (MDA-MB 231 WT, MDA-MB 231 KO, Healthy) were collected from two different sample batches. The exact class sizes are reported in Table S5. The SNRs at 1650 cm^–1^, calculated per class as the quotient of the mean intensity and the standard deviation, was 50 for MDA-MB 231 WT, 72 for MDA-MB 231 KO, 47 for HUVEC and 51 for HAoEC.
Several postprocessing steps are required to ensure comparability of individual FTIR measurements. The postprocessing workflow is illustrated in Figure. First, hyperspectral images (a) were segmented into regions of interest (ROIs, i.e., cells) and low-signal regions, i.e., the background (b). Spectra of low intensity and ROIs containing less than 4 spectra were discarded.
Processing workflow for FTIR hyperspectral images. a) FPA image obtained from the integrated area under the Amide I band (1580 cm–1 to 1710 cm–1), b) segmentation into ROIs with cell regions (green) and background (dark blue), c) raw spectra, d) baseline correction (BC), e) spectra after BC and EMSC correction, f) resulting average cell spectrum after normalization to the Amide I band.
The remaining 6410 raw spectra (c) were smoothed by a first-order Savitzky–Golay filter with a window length of 7. The smoothing filter settings were adjusted to balance noise suppression and spectral fidelity. Correction for Mie scattering was performed using the Extended Multiplicative Signal Correction (EMSC) package of the openvibspec library.? Mie scattering occurs from (near-)spherical sample structures and has been shown to severely distort infrared band shapes.? The choice to use EMSC was based on its effectiveness in the removal of Mie scattering artifacts from single-cell measurements.?
Next, the baseline was removed (d) using an asymmetric least-squares (AsLS) algorithm? of the pybaseline library.? The AsLS approach is based on asymmetric weights to correct baseline drifts without distorting the spectral bands. It has proven to be most useful for spectra with distinct peaks and smooth baseline drifts,? which is why it was deemed appropriate for this data collection.
After baseline correction, the spectra were trimmed to the wavenumber range of interest (1760 cm^–1^ to 1000 cm^–1^). Due to the use of CaF_2_ coverslips and windows, low wavenumbers below 1000 cm^–1^ are strongly attenuated and cannot be reliably detected. The upper range limit was based on the fact that biological molecules do not absorb in the region of 1800 cm^–1^ to 2800 cm^–1^.? The exact cutoff was set so as to exclude as much noise as possible without truncating the esterified lipid CO band. Subsequently, the cell mean was calculated from all the measurements collected in the same ROI (e).
In a final step, the spectra were normalized to the Amide I band (f). The Amide I absorption band is representative of the protein backbone. It is a strong, broad and isolated region in the spectrum and can therefore serve as a benchmark for the overall protein content in biological samples.
O-PTIR
2.3.2
In the custom O-PTIR setup, several spectra were recorded as single-point spectra, averaging three spectra per measurement. Measurement locations were chosen systematically for each cell. To begin, the cell nucleus was located via the visible image and the optics were adjusted to obtain the maximum signal. The spectra were then collected at a series of points approximately 500 nm away from the signal maximum in the cardinal directions (up, down, left, right) and, in addition, at a point on the edge of the cells. One sample batch comprising 25–30 cells per class was analyzed with the O-PTIR system. Because the group sizes were considerably smaller than in the collection of FTIR spectra, a standard-deviation approach was not deemed feasible to estimate the noise in the O-PTIR data. As a consequence, all spectra were corrected for the laser chip crossover (see below) and normalized to the Amide I band, and the noise was taken as the mean intensity in the region of 980 cm^–1^ to 1000 cm^–1^, where no O-PTIR signal is present. The SNRs, calculated as the quotient of the class-wise mean intensity at 1650 cm^–1^ and the class-wise mean noise, were 31, 31, 24, and 24 for MDA-MB 231 WT, MDA-MB 231 KO, HUVEC and HAoEC cells, respectively.
The photothermal signal is the difference in the transmitted probe laser intensity with and without the absorption of the pump laser radiation by the sample. The change in the transmitted intensity has the same frequency as the pump laser modulation, while the baseline intensity of the probe laser corresponds to the low-frequency component of the transmission signal. For baseline correction, all raw spectra were therefore divided by the low-frequency part of the transmitted visible laser intensity before further processing.
In a next step, a strong signal artifact resulting from the EC-QCL crossover transition between Chip 3 and Chip 4 had to be removed from the spectra. To this end, the values in the affected spectral region (1265 cm^–1^ to 1355 cm^–1^) were excluded from further analysis. In Figure, the missing values were interpolated for better visualization.
Typical O-PTIR (a) and FTIR (b) spectra after processing. The spectra shown represent all HUVEC cell means included in the classification step. The gap in the O-PTIR spectrum is due to a crossover point in this region.
After calculating average cell spectra, these were smoothed using a first-order Savitzky–Golay filter with a window size of 12 and normalized to the Amide I band. Again, the filter parameters were chosen to preserve the main spectral features but remove unwanted noise. Higher filter orders were tested but did not improve the result. An example of O-PTIR and FTIR spectra after the processing procedure is presented in Figure.
Classification and Selection Algorithms
2.4
All algorithms were used as implemented in the Python scikit-learn module,? version 1.3.1.
Training and Test Set Split
2.4.1
Each data set was split into a training subset and a test subset considering the weight of each class. In general, for p parameters to be fitted in the regression model, it is favorable to use a splitting ratio of .? Assuming that only 5% of the available data points are selected by the LASSO subset, the 380 wavenumbers recorded by the FTIR microspectrometer in the range of interest would be reduced to a mere 19 wavenumbers, yielding a training/test set ratio of 4:1. Although this ratio would be higher for the O-PTIR spectra if we considered the larger number of data points (higher spectral resolution), the low number of samples would make cross-validation unreliable. Consequently, the training/test split ratio was set to 80:20 for both the FTIR and O-PTIR data sets. The classification routine was repeated six times with random assignment to training and test sets.
LASSO
2.4.2
The Least Absolute Shrinkage and Selection Operator (LASSO) regression finds a linear least-squares fit to a data set by adding a penalty parameter α. The LASSO algorithm solves the following optimization problem:?
where X is the covariate matrix, y is the response variable, m is the coefficient vector, with ∥m∥1 = |b 1| + |b 2| + ..., and N is the number of samples.
The LASSO reduces the number of features on which the solution depends, thereby decreasing the likelihood of overfitting. Using gradient descent approximation to iteratively solve the above optimization problem, it is suitable for variable selection in collinear data sets like spectra.?
A series of penalty values α (?) was input to find the best regression model by cross-validated accuracy score before the final model was computed.
Linear Discriminant Analysis
2.4.3
After the data set had been trimmed to the wavenumber subset selected by the LASSO routine, Linear Discriminant Analysis (LDA) was carried out to build a classifier.
The solver was set to eigenvalue decomposition, which involves the estimation of the covariance matrix. Although matrix calculations are computationally intense, computation time is expected to be reasonable for a reduced number of features like in this case.
Shrinkage regularization was found to boost the performance of the algorithm even for a reduced number of spectral features. Shrinkage improves the generalization of the covariance matrix when the number of training samples is smaller than the number of features.? The optimum shrinkage value was determined via the GridSearchCV module, which brute-forces a range of shrinkage parameters and selects the one with the highest metric.
Each LDA model was cross-validated. Cross-validation was performed with 10 folds for the FTIR data and 3 folds for the much smaller O-PTIR data set.
Results and Discussion
3
Spectral Marker Candidates Identified by the
LASSO Algorithm
3.1
For the FTIR measurements, classifier performance was best for 7–10 wavenumbers (Table S1), whereas 9–13 marker candidates were selected for classification of the O-PTIR data (Table S2) depending on the test run. The number of features within the selected wavenumber subset is related to the LASSO penalty (see section). The slightly larger number of features required for the O-PTIR model can be explained by a higher variation in this data set owing to the higher spectral (1 cm^–1^) and spatial (<700 nm in the lateral direction) resolution of the O-PTIR instrument.? Apart from the more local character of the spectra, the smaller sample size of the O-PTIR data set increases uncertainty in the estimation of parameters.
This having been said, the wavenumbers of the selected spectral features differ slightly between the two data sets but are overall consistent to each other in terms of associated functional groups. The candidate markers identified by LASSO preselection are discussed in detail in the following sections.
Phosphates
3.1.1
The symmetric (ν_s_PO_2_ ^–^) and asymmetric (ν_as_PO_2_ ^–^) stretching modes of the phosphate bond ?,? are indicators of the DNA and RNA conformation. These spectral regions have been suggested as potential cancer markers ?,? and have been a key factor in the classification of breast cancer types into subcategories with varying aggressiveness.?
Potential phosphate markers emerged from the analysis of both data sets. In the FTIR data, the algorithm detected C–O stretching of the phosphodiester groups at 1060 cm^–1^ to 1066 cm^–1^,? while ν_as_PO_2_ ^–^ modes were prominent in the O-PTIR analysis. These were found at 1210 cm^–1^, 1229 cm^–1^ to 1231 cm^–1^, 1236 cm^–1^, and 1242 cm^–1^ to 1243 cm^–1^.
Similar marker bands have been suggested for ovarian cancer cells? and cervical cancer cells.? In ovarian cancer cells, the ν_s_PO_2_ ^–^ band intensity at 1086 cm^–1^ and the ν_as_PO_2_ ^–^ intensity at 1242 cm^–1^ differed significantly between various cell lines.? In precancerous cervical cells, a reduced intensity in both ν_s_PO_2_ ^–^ and ν_as_PO_2_ ^–^ was found at 1080 cm^–1^ and at 1237 cm^–1^, respectively.?
A caveat here is that the cells analyzed in this study were fixed and stored in PBS, which may lead to intracellular phosphate accumulation. Although all cell samples were prepared and stored under the same conditions and by the same laboratory technician, bias due to differences in phosphate uptake or release during sample preparation cannot be fully excluded.
Amide I and II
3.1.2
Proteins are chains of amino acids that are linked by peptide bonds.? The amide bands reflect the secondary structure of proteins, that is, the local folding patterns of this polypeptide chain. ?,? The two most common folding patterns are the α helix and the β sheet.?
The Amide I vibration absorbs near 1650 cm^–1^ and is dominated by the CO stretching vibration, with contributions from the CN stretching, the CCN deformation, and the NH bending vibrations.? The conformation of the protein backbone dictates the degree to which the various vibrations contribute to the Amide I mode, making it a sensitive probe of protein secondary structure.?
The Amide II mode, centered at 1550 cm^–1^, is mainly a combination of the NH bending vibration and the CN stretching vibration.? Minor contributions come from the CO bending as well as the CC and NC stretching vibrations. The Amide II vibration also reflects secondary structure, but its relationship with the contributing vibrations is more complex than that for the Amide I mode.?
Differences in the alpha helical and beta-sheet structures of Amide I have been described between normal and ovarian cancer cells as well as among different types of ovarian cancer cells, suggesting possible structural alterations.? Similarly, a shift of the Amide II band to lower wavenumbers was reported in metastatic cells versus primary melanoma cells.? In abnormal cervical cells, the Amide II band at 1539 cm^–1^ was weaker than in normal cells.?
In the FTIR measurements, the LASSO algorithm selected Amide II features at 1552 cm^–1^ to 1558 cm^–1^, 1563 cm^–1^, and 1573 cm^–1^, as well as Amide I features at 1627 and 1629 cm^–1^. The Amide II features at 1541 cm^–1^, 1545 cm^–1^ to 1548 cm^–1^, and 1592 cm^–1^ to 1595 cm^–1^, and the Amide I band at 1645 cm^–1^ to 1648 cm^–1^ emerged as potential markers from the O-PTIR spectra.
CH2/CH3 Bending
3.1.3
The spectral region of 1450 cm^–1^ to 1465 cm^–1^ is associated with the CH_2_ scissoring and asymmetric CH_3_ bending modes (δ_as_CH_3_) of proteins and lipids. ?,?,?
The relative intensity of the peak at 1448 cm^–1^ emerged as a prominent feature in several breast cancer cell lines.? In another study scrutinizing the metastatic properties of different cell types, the relative intensity at 1456 cm^–1^ was found to be increased in metastatic versus nonmetastatic colon and melanoma cells, but lower in metastatic MDA-MB-231 cells versus MCF-7 cells.?
In the O-PTIR data, a corresponding candidate marker band was detected at 1456 cm^–1^ to 1458 cm^–1^.
Ester CO Stretching of Phospholipids
3.1.4
The reprogramming of the lipid metabolism has been recognized as a hallmark of cancer and metastasis. ?−? ? In FTIR analysis, the CO stretching of phospholipids, a main component of the cell membrane,? has been associated with carcinogenesis and migration. In a study aimed at distinguishing between SKBr3 and MCF-7 breast cancer cells, the band ratio of 1650 cm^–1^/1740 cm^–1^ emerged as one of several statistically significant marker candidates.? By the same token, altered band intensities at 1741 cm^–1^ were observed in several cancerous cell lines as compared to normal ovarian cells.? Furthermore, it has been reported that fibroblasts developed a distinct peak at 1721 cm^–1^ after stimulation with MCF-7 breast cancer cells; in addition, the band centered at 1735 cm^–1^ decreased in intensity after cancer stimulation.?
Ester CO vibrational bands were selected in both data sets, at 1743 cm^–1^ for FTIR spectra, and at 1725 cm^–1^ to 1729 cm^–1^ and 1748 cm^–1^ to 1749 cm^–1^ for O-PTIR spectra.
Distribution of Marker Candidates for Different
Cell Classes
3.2
To establish the relationship of the potential markers identified by the LASSO with cancer and migration properties, the corresponding measurement values were subjected to statistical analysis. As the primary focus of this article is on the novel method of O-PTIR, only the O-PTIR measurements will be discussed in detail. The distribution plots for the FTIR data are provided in the Supporting Information (Figures S2 and S3). Since the selected subset differed slightly for each trial run, a total of 26 wavenumbers emerged from the LASSO regression on the O-PTIR data. Such a variation in selected wavelengths is not unexpected, as the absorption at neighboring wavelengths is often correlated, yielding multiple wavelengths that contain essentially the same information. Some of the marker candidates were therefore merged into groups and replaced by the group mean.
Histograms were plotted by cell class and potential marker band to visualize differences in signal intensities. As evident from Figures and ?, some values are not normally distributed. Hence, a Mann–Whitney U test was performed for each selected marker band to quantify the differences between cell classes (Table, Figure).
1: Mann–Whitney U Statistics and p Values per Cell Group and Selected Band for the O-PTIR Data Set
Mann–Whitney U test statistics for Cancer/Healthy and MDA-MB 231 WT/MDA-MB 231 KO groups for each IR marker band.
Cancer Cells versus Endothelial Cells
3.2.1
In a first analysis, the MDA-MB 231 WT and MDA-MB 231 KO cells were combined to form a “Cancer” cell class. Figure shows the distribution of signal values for healthy endothelial cells versus cancer cells. The y axis indicates the number of samples, and the x axis represents the intensity, which was grouped into 12 bins.
Histograms showing the distribution of O-PTIR signal intensities for “Cancer” (MDA-MB 231 WT + MDA-MB 231 KO, gray) and “Healthy” (HUVEC+HAoEC, light purple) cell types. Twelve bins were used here. The dark purple areas mark the overlap between the two groups.
Overall, there is a strong overlap between the two cell classes as illustrated by the dark purple areas. For the potential phosphate markers at 1210 cm^–1^, 1229 cm^–1^ to 1231 cm^–1^, and 1236 cm^–1^, the signal is generally higher in the cancer samples. This is in line with the results disclosed by Amjad et al.,? where a higher band intensity at 1209 cm^–1^ and a lower intensity at 1238 cm^–1^ were reported. However, only the measurement at 1210 cm^–1^ turned out to be significant (Mann–Whitney U test, p < 0.01).
A similar trend can be observed for the CH_2_/CH_3_ band at 1456 cm^–1^ to 1457 cm^–1^ as well as the esterified lipid bands at 1725 cm^–1^ to 1729 cm^–1^ and 1748 cm^–1^ to 1749 cm^–1^. The CH_2_/CH_3_ band and the 1748 cm^–1^ to 1749 cm^–1^ band resulted significant (p < 0.01). A change in the band around 1740 cm^–1^ was also described by Li et al.? in their FTIR analysis of ovarian cancer cells.
In the Amide-II bands at 1541 cm^–1^ and 1545 cm^–1^ to 1548 cm^–1^, the distribution of values is shifted to lower intensities for the cancer measurements. This is also true for the marker region detected in the Amide-I band (1645 cm^–1^ to 1648 cm^–1^). Although not statistically significant, these trends corroborate the results reported in similar studies. ?,?
MDA-MB 231 WT versus MDA-MB 231 KO Cells
3.2.2
To analyze the invasive properties, a comparison was made between the MDA-MB 231 WT and MDA-MB 231 KO specimens (Figure). A slightly lower number of 10 intensity bins was chosen because of the lower number of samples in the two cancer groups alone (as opposed to the entire data set).
Histograms showing the distribution of O-PTIR signal intensities for MDA-MB 231 WT (“WT”, gray) and MDA-MB 231 KO (“KO”, light purple) cell types. Ten bins were used. The dark purple areas mark the overlap between the two groups.
In the ν_as_PO_2_ ^–^ region, the intensity at the bands of 1210 cm^–1^, 1229 cm^–1^ to 1231 cm^–1^, 1236 cm^–1^, and 1242 cm^–1^ to 1243 cm^–1^ was higher in the KO samples. Indeed, the Mann–Whitney U test revealed a significant difference (p < 0.05) between the MDA-MB 231 WT and the MDA-MB 231 KO samples in all the detected phosphate marker candidates. Two other bands resulted significant (p < 0.05): the band at 1592 cm^–1^ to 1595 cm^–1^, which was lower for highly invasive cells, and the potential marker at 1725 cm^–1^ to 1729 cm^–1^, which was higher in the MDA-MB 231 WT group.
Classification by LDA
3.3
After the data set had been reduced to the wavenumbers selected by the LASSO routine, the remaining data were randomly allocated to a training and test set, and a classification model was built using LDA. Randomization and classification were repeated 6 times.
Figures and ? display the confusion matrices for the FTIR and O-PTIR data set, respectively. The percentages are mean values obtained from the 6 classification runs as described above. Diagonal and off-diagonal elements correspond to the percentages of correct and incorrect predictions.
Confusion matrix for the classification of MDA-MB 231 WT, MDA-MB 231 KO and Healthy cell classes using the O-PTIR data. The matrix shows the mean classification values over 6 runs, with diagonal elements representing correct and off-diagonal elements representing incorrect predictions.
Confusion matrix for the classification of MDA-MB 231 WT, MDA-MB 231 KO and Healthy cell classes using the FTIR data. The matrix shows the mean classification values over 6 runs, with diagonal elements representing correct and off-diagonal elements representing incorrect predictions.
The mean classification accuracy (Table S4) was 93 ± 3% for the large FTIR data set; the weighted F1 score, sensitivity, and specificity were 93 ± 2%, 93 ± 2%, and 96 ± 1% (Tables S5, S7, and S8). It should be noted here that the classification accuracy may be influenced by interbatch differences, which were not assessed in this study due to a lack of data and considerable differences in batch size.
In the O-PTIR measurements, the classes were identified with a mean accuracy of 81 ± 6% (Table S4), a weighted F1 score of 81 ± 6%, a weighted sensitivity of 81 ± 6%, and a weighted specificity of 90 ± 4% (Tables S6–S8). The accuracies were verified by cross-validation (Tables S9 and S10).
There is some confusion between the cell lines, which is somewhat expected given the small chemical difference between the MDA-MB 231 WT and MDA-MB 231 KO groups. The higher spatial and spectral resolution of the O-PTIR technique and the small number of samples reduce the overall classification accuracy for the O-PTIR data. The small sample number is also reflected by the higher uncertainty of 6%. In addition, the mean cell spectra comprised only 5–6 point spectra, which may lead to greater variability in the O-PTIR data. In view of the above, these classification results are preliminary and are expected to change with a larger pool of analyzed cells, the number of points averaged per cell, and the number of batches included in the analysis.
Drawing direct comparisons to immunohistochemical research is challenging at best. There is a high degree of heterogeneity in cell and tissue types, which greatly influences the performance of antibody markers. As the sensitivity and specificity of antibodies vary greatly, cancer detection often requires a combination of markers that is tailored to a specific tumor or cell type, making it impossible to pinpoint standards for sensitivity and specificity that are valid across cells and tissues of different origin. As an example for research on triple-negative breast cancers, Thike et al.? were able to differentiate basal-like cancers within triple-negative cancers using a combination of three markers with a specificity of 100% and a sensitivity of 78%. With regard to individual markers, the sensitivity and specificity in identifying primary breast carcinoma have been reported to be 93% and 92% for mammaglobin, and 85% and 99% for GCDFP-15.?
Concluding Summary
4
Our findings confirm that the spectra recorded with the home-built O-PTIR instrument exhibit the same qualitative features as spectra obtained with a commercial FTIR microspectrometer.
The potential marker bands selected by the LASSO routine as distinctive features for the highly invasive MDA-MB 231 WT cells, noninvasive MDA-MB 231-KO cells and the healthy controls (HUVEC and HAoEC) include the Amide I and Amide II, phosphate, CH_2_/CH_3_ bending and esterified lipid spectral regions. A larger number of variables was needed to establish a classification model based on the O-PTIR measurements, which reflects the higher spatial and spectral resolution of the technique. Despite the higher variability in the O-PTIR as compared to the FTIR spectra, the candidate markers output by the LASSO are consistent for both data sets.
In-depth analysis of the O-PTIR marker candidates showed differences between the cancerous (MDA-MB 231 WT + MDA-MB 231 KO) and the healthy populations as well as between the MDA-MB 231 WT and MDA-MB 231 KO cell lines, which only differed in the presence or absence of the JAG1 notch ligand. Our findings suggest vibrational bands distinguishing cancer and healthy samples in the ν_as_PO_2_ ^–^, CH_2_/CH_3_ bending and esterified lipid regions. For the MDA-MB 231 WT and MDA-MB 231 KO groups, significant differences were found for all LASSO-selected ν_as_PO_2_ ^–^ bands, in the Amide II region and in the esterified lipid region. These results are in line with existing evidence implicating phosphate and esterified lipid vibrational groups as potential markers in the study of cancer and migratory properties at the cellular level. ?,?,?
Subsequent classification by LDA yielded a mean sensitivity and specificity of 93 ± 2% and 96 ± 1% for the FTIR versus 81 ± 6% and 90 ± 4% for the O-PTIR spectra. The reason for the lower performance of the algorithm on the O-PTIR data set lies in the higher spatial and spectral resolution of the O-PTIR instrument and the smaller group sizes. O-PTIR cell measurements comprised a smaller number of spectra per cell, which may not be enough to even out local chemical heterogeneities in biological samples. These classification results are expected to improve with a larger number of measurements and cells. Furthermore, this study does not address the influence of interbatch effects and a larger number of measurements per cell on the classification accuracy. More research is needed to improve the generalization of these preliminary results beyond the current setup and translate them into clinical applicability.
Notwithstanding these limitations, the fact that the algorithm can identify each class in both data sets (even with lower accuracy) substantiates the general validity of the approach taken and underscores the potential of the O-PTIR technique as a tool for the detection of high-risk invasive cells.
These results confirm the capabilities of our transmission-mode O-PTIR prototype and have established a baseline for future research. Building on the experience gained in this work, further efforts will be dedicated to the study of biological samples in aqueous environments and on-chip systems. This will provide further insights into the applicability of O-PTIR spectro-microscopy to the study of hydrated tissues and live cells in aqueous media.
Supplementary Material
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Amjad M.Ullah H.Andleeb F.Batool Z.Fourier-Transform Infrared Spectroscopy (FTIR) for Investigation of Human Carcinoma and Leukaemia Laser Eng.202151217233
- 2Haaland D. M.Jones H. D. T.Thomas E. V.Multivariate Classification of the Infrared Spectra of Cell and Tissue Samples Appl. Spectrosc.19975134034510.1366/0003702971940468 · doi ↗
- 3Simonova D.Karamancheva I.Application of Fourier Transform Infrared Spectroscopy for Tumor Diagnosis Biotechnol. Biotechnol. Equip.2013274200420710.5504/BBEQ.2013.0106 · doi ↗
- 4Wu J.Xu Y.Sun C.Soloway R. D.Xu D.Wu Q.Sun K.Weng S.Xu G.Distinguishing malignant from normal oral tissues using FTIR fiber-optic techniques Biopolymers 20016218519210.1002/bip.101311391568 · doi ↗ · pubmed ↗
- 5Li R.Allen H. C.Developing transferable and universal IR biomarkers for intraoperative colorectal cancer diagnosis via FTIR spectroscopy Sci. Rep.2025152294410.1038/s 41598-025-05068-z 40596143 PMC 12215629 · doi ↗ · pubmed ↗
- 6Gurcan M.Boucheron L.Can A.Madabhushi A.Rajpoot N.Yener B.Histopathological Image Analysis: A Review IEEE Rev. Biomed. Eng.2009214717110.1109/RBME.2009.203486520671804 PMC 2910932 · doi ↗ · pubmed ↗
- 7Duraiyan J.Govindarajan R.Kaliyappan K.Palanisamy M.Applications of immunohistochemistry J. Pharm. Bioallied Sci.20124 S 30710.4103/0975-7406.10028123066277 PMC 3467869 · doi ↗ · pubmed ↗
- 8Holton S.Walsh M.Kajdacsy-Balla A.Bhargava R.Label-Free Characterization of Cancer-Activated Fibroblasts Using Infrared Spectroscopic Imaging Biophys. J.20111011513152110.1016/j.bpj.2011.07.05521943433 PMC 3177049 · doi ↗ · pubmed ↗
