Infrared spectral data of natural and man-made textile fibres for material identification and classification
Reeha Parkar, Angelica Jain, Miranda Prendergast-Miller, Thomas Stanton, Kelly J. Sheridan, Matteo D. Gallidabino

TL;DR
This paper introduces a dataset of infrared spectra from natural and man-made textile fibers to aid in material identification and classification for forensic and environmental research.
Contribution
The paper provides a structured, openly accessible dataset of ATR-FTIR spectra from verified textile samples with multiple processing levels for analytical use.
Findings
The dataset includes 160 spectra from 137 verified textile samples, covering both natural and man-made fibers.
Multiple processing levels are provided, including raw data and pre-processed spectra for machine learning workflows.
The dataset supports material identification, spectral comparison, and algorithm benchmarking in forensic and environmental science.
Abstract
This article presents a structured dataset of attenuated total reflectance Fourier transform infrared (ATR-FTIR) spectra acquired from natural and man-made textile fibres, compiled to support research in forensic, analytical, and environmental science. The collection comprises 160 spectra obtained from 137 verified textile samples originating from multiple sources, including industry reference collections and academic textile archives. The dataset includes several processing levels: (1) raw instrument files in PerkinElmer .sp format; and (2–5) tabular feature matrices containing unprocessed transmission spectra, baseline-corrected spectra, averaged baseline-corrected spectra grouped by fibre subtype, and fully pre-processed spectra prepared for chemometric and machine-learning workflows. All spectra were recorded using a Frontier FT-IR spectrometer by PerkinElmer equipped with a…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCultural Heritage Materials Analysis · Dyeing and Modifying Textile Fibers · Mineralogy and Gemology Studies
Specifications TableSubjectEngineering & Materials scienceSpecific subject areaATR-FTIR characterisation of textile fibres for forensic, materials, and environmental applications.Type of data*Raw instrument files (.sp);**Exported feature tables, before and after pre-processing (.csv);**Python script (.py).*Data collectionSpectra were collected using a PerkinElmer Frontier FT-IR spectrometer equipped with a single-reflection diamond ATR accessory. Samples were analysed in transmission mode over the 4000–550 cm⁻¹ range at 4 cm⁻¹ resolution with four co-added scans. Multiple spectra were acquired per sample, and one or several high-quality measurements were retained. All spectra were acquired using PerkinElmer Spectrum software v10.4.00 and processed using Python v3.12.0.Data source locationKing’s Forensics, Department of Analytical, Environmental and Forensic Sciences, King’s College London, United KingdomData accessibilityRepository name: Mendeley DataData identification number: 10.17632/rx3fjgz96x.3Direct URL to data: https://doi.org/10.17632/rx3fjgz96x.3All files are openly available under a Creative Commons CC BY 4.0 licence and can be downloaded individually or as a complete dataset package.Related research articleNone
Value of the Data
1
- •These data provide a comprehensive reference collection of infrared spectra from natural and man-made textile fibres, measured under consistent instrumental conditions. The dataset covers a wide variety of materials commonly found in clothing and household fabrics, offering a standardised foundation for fibre characterisation and comparison.
- •The dataset can be reused to develop, benchmark, and validate algorithms for spectral matching, material classification, or fibre identification across forensic, analytical, and environmental science applications. It is suitable for a wide range of computational approaches, including chemometrics and machine-learning models.
- •The inclusion of multiple processing levels supports varied analytical needs. Raw transmittance spectra allow users to apply their own workflows; baseline-corrected spectra serve as ready-to-use reference profiles for visual or manual identification; and fully pre-processed spectra are optimised for chemometric and machine-learning applications.
- •The metadata accompanying each sample allow users to relate spectral features to material descriptors such as fibre class, type, and subtype. This supports extended applications such as building spectral libraries, studying variability within and between textile categories, and evaluating spectral features relevant to discrimination tasks.
- •The different spectral pre-treatment pipelines implemented were evaluated using support vector machine (SVM) and random forest (RF) classifiers under cross-validation to enable objective comparison. The resulting performance metrics provide a reference baseline for future studies using comparable ATR-FTIR data and classification approaches.
- •The dataset will benefit forensic laboratories involved in fibre examination, environmental scientists investigating textile microfibre pollution, materials scientists and polymer chemists engaged in material identification, and data scientists developing and benchmarking chemometric and machine-learning models. It also provides an open teaching resource for students and educators in spectroscopy, textile analysis, and applied computational modelling.
Background
2
The dataset was compiled to support research on the spectroscopic characterisation and classification of textile fibres using infrared techniques. Fibre identification is an essential task in both forensic and environmental science [[1], [2], [3], [4]], where analysts frequently need to determine material composition from small samples. Attenuated total reflectance Fourier transform infrared (ATR-FTIR) spectroscopy provides a rapid, non-destructive method for distinguishing between fibre types based on their polymer composition and characteristic absorption features [[5], [6], [7], [8]].
Existing reference collections are often limited in scope, heterogeneous in acquisition conditions, or not publicly accessible, hindering method validation and comparative studies. To address this gap, a consistent series of ATR-FTIR measurements was acquired from a broad range of natural and man-made fibres under standardised conditions. The dataset aims to facilitate reproducible research in fibre identification, classification, and spectral modelling by providing openly available, well-documented reference spectra suitable for chemometric and machine-learning applications.
Data Description
3
The dataset, available on Mendeley Data [9], consists of ATR-FTIR spectra of textile fibre samples, organised into six main folders with consistent sample identifiers linking all files. The total number of spectra is 160, corresponding to 137 distinct fibre samples and covering 26 fibre subtypes distributed across natural and man-made textile categories. A summary overview of the dataset content is provided in Table 1, and an overview of the distribution of fibre classes, types and subtypes, based on the number of samples, represented in the collection is shown in Fig. 1.Table 1. Summary of dataset content.Table 1 dummy alt textFolderFormatDescriptionFiles01_raw_spectra/.spOriginal instrument files16002_feature_matrix_raw/.csvTabular feature matrix of unprocessed transmission spectra103_feature_matrix_baseline_corrected/.csvTabular feature matrix of baseline-corrected spectra104_feature_matrix_baseline_corrected_average/.csvTabular feature matrix of average baseline‐corrected spectra per subtype105_feature_matrices_preprocessed/.csvTabular feature matrices of pre-processed spectra prepared for chemometric and ML analyses306_other_files/.csv / .pySample metadata and Python preprocessing companion script2Fig. 1Sunburst diagram illustrating the composition of the dataset based on the number of fibre samples across fibre classes, types, and subtypes. The inner ring shows fibre origins (natural or man-made), the middle ring shows fibre types, and the outer ring represents the corresponding subtypes included in the collection. Segment sizes and the numeric labels shown within each segment both correspond to the number of samples assigned to each category.Fig 1 dummy alt text
All tabular feature matrices (folders 02–05) share a consistent structure. Each row represents one measurement, with columns including a general spectrum identifier (Spectrum_ID), the sample of origin (Source_ID), the replica number (Replica) and categorical descriptors (Origin, Type, Subtype). These are followed by spectral variables corresponding to wavenumbers from 4000 cm⁻¹ to 550 cm⁻¹. The numeric values represent transmittance (0–1) for datasets in folders 02–04 and processed spectral intensities for the fully pre-processed dataset in folder 05. In addition, folder 06 contains a metadata table providing sample-level descriptors. Each row represents one material sample, including a unique sample identifier (Sample_ID), the source institution or supplier (Source), the reference collection from which the material was obtained (Collection), and categorical descriptors (Origin, Type, Subtype). Where available, additional descriptors are provided, including the manufacturer (Manufacturer), trade name (Trade_name), dye status (Dyed), and any further relevant notes (Additional_Details). These fields enable traceability of each spectrum to its original material source.
Folder and file structure:
- 1.01_raw_spectra/ (160 SP files) — Contains original instrument output files in PerkinElmer .sp format. These files can be opened using software such as PerkinElmer Spectrum, Spectragryph, or other FT-IR data viewers supporting .sp files.
- 2.02_feature_matrix_raw/ (1 CSV) — Tabular feature matrix of unprocessed transmission spectra. Each row corresponds to a measurement.
- 3.03_feature_matrix_baseline_corrected/ (1 CSV) — Tabular feature matrix of baseline-corrected spectra generated using the asymmetric least-squares (ALS) method.
- 4.04_feature_matrix_baseline_corrected_average/ (1 CSV) — Tabular feature matrix of average baseline-corrected spectra grouped by fibre subtype (e.g., wool, nylon 6).
- 5.05_feature_matrix_preprocessed/ (1 CSV) — Tabular feature matrices of fully pre-processed spectra prepared using three alternative preprocessing pipelines: (i) conversion to absorbance, ALS baseline correction, and SNV scatter correction; (ii) conversion to absorbance, ALS correction, SNV, and Savitzky–Golay smoothing (15-point window, first derivative); and (iii) conversion to absorbance, ALS correction, SNV, and Savitzky–Golay smoothing (15-point window, second derivative).
- 6.06_other_files (1 CSV, 1 Python script) — Contains a metadata table providing sample-level information and a Python companion script enabling users to apply the same preprocessing workflow to their own spectra.
Representative examples of spectra at different processing stage (folders 02, 03 and 05) are illustrated in Fig. 2.Fig. 2. Representative ATR-FTIR spectra illustrating the effect of each processing stage for one natural fibre (left) and one man-made fibre (right). Rows show: (a-b) raw transmission spectra (%T); (c-d) baseline-corrected spectra after ALS correction (%T); and (e-f) fully pre-processed spectra following absorbance conversion, ALS correction, SNV, and Savitzky–Golay smoothing (15-point window, first derivative).Fig 2 dummy alt text
Experimental Design, Materials and Methods
4
Sampling - Textile fibre samples were collected from a range of verified sources to capture the diversity of materials encountered in clothing and household textiles. Samples were obtained from Microtrace LLC (Elgin, USA), specifically from their Forensic Fiber Reference Collection and Arbidar Natural Fiber Collection, and from the internal textile collections of the UNUSUWUL group at the University of the Arts London (London, UK) and the Bio-Couture group at Northumbria University (Newcastle, UK). All samples were supplied with confirmed identities, as provided by the respective suppliers. Only pure (non-blended) fibres were included. Samples were provided in various forms, including fibre tufts, yarns, and woven or knitted swatches. Variation in finishing and manufacturing treatments was intentionally retained to reflect real-world heterogeneity. The fibre classification scheme used in this study was developed based on common forensic practice and informed by the nomenclature principles outlined in ISO 2076 [10]. The resulting hierarchical structure is shown in Fig. 3.Fig. 3. Hierarchical fibre classification scheme adopted for structuring the dataset, illustrating the organisation of textile fibres from origin (natural or man-made) to fibre type and subtype. This scheme was used to define the categorical descriptors reported in the dataset and to support consistent grouping of fibre materials.Fig 3 dummy alt text
Spectral analysis - Spectral measurements were carried out using a Frontier FT-IR spectrometer by PerkinElmer (Shelton, UK) equipped with a single-reflection diamond ATR accessory. Each sample was placed in direct contact with the ATR crystal and secured using the built-in pressure clamp. Spectra were acquired in transmission mode over the 4000–550 cm⁻¹ range at 4 cm⁻¹ spectral resolution with four co-added scans per acquisition. The applied contact pressure was typically set to 100 a.u., although this and the sample positioning were occasionally adjusted to optimise spectral quality. Adjustments were made to achieve a stable 100 % transmittance baseline (avoiding values above 100 %) and a total transmission range of at least 20 %, where possible. Spectra not meeting these quality conditions, or exhibiting excessive noise or visible artefacts attributable to poor crystal contact, were excluded from the final dataset. A background spectrum was collected before each analysis, and the ATR crystal was cleaned with acetone between samples to prevent contamination. Acetone was purchased by Sigma-Aldrich and was HPLC grade (>99.9 %) Multiple spectra were often acquired per sample, and one or several were retained. Replicates were collected to capture potential intra-sample heterogeneity (e.g., fibre morphology, surface treatments, or contact variability) and, where necessary, to improve representation of fibre subtypes with limited physical samples.
Data processing - All spectra were initially recorded as transmittance data (folder 02). Baseline correction was performed by first converting transmission spectra to absorbance, applying the asymmetric least-squares (ALS) algorithm, and subsequently converting the corrected spectra back to transmittance. This workflow was adopted because ALS operates by estimating and subtracting a baseline centred around zero intensity; therefore, conversion to absorbance (where the spectral baseline is approximately zero) is required to ensure mathematically appropriate correction. After baseline removal, spectra were reconverted to transmittance to preserve their conventional representation and facilitate visual comparison and reference use (folder 03). The same baseline-corrected spectra were then averaged by fibre subtype (e.g., wool, nylon 6) to produce a set of representative reference profiles (folder 04).
For use in chemometric and machine-learning applications, spectra were further processed using three alternative preprocessing pipelines, each provided as a separate tabular feature matrix (folder 05). In all pipelines, spectra were converted to absorbance to linearise intensity relationships, baseline-corrected by ALS to mitigate baseline drift, and normalised using standard normal variate (SNV) to reduce multiplicative scatter effects and variability due to fibre–crystal contact and effective pathlength. Two pipelines additionally applied Savitzky–Golay filtering to compute first- or second-derivative spectra (15-point window), which can enhance subtle spectral features and improve discrimination between chemically similar fibres by reducing broad background contributions and resolving overlapping bands, albeit at the cost of increased noise sensitivity.
These alternative preprocessing strategies are provided to support different classification approaches and modelling objectives. Their performance in supervised classification tasks is summarised in Table 2, as evaluated using the classification procedures described below.Table 2. Cross-validated classification accuracies (%) for different spectral preprocessing pipelines evaluated using support vector machine (SVM) and random forest (RF) classifiers. Results are reported for binary classification by fibre origin (natural vs man-made) and multi-class classification by fibre type, based on 5-fold cross-validation.Table 2 dummy alt textPipelineAccuracy [%]SVM modelRF modelBinaryMulti-classBinaryMulti-classALS + SNV93.1294.3885.6295.00*ALS + SNV + SG(15;D1)86.2588.7594.3888.12ALS + SNV + SG(15;D2)*72.5075.6283.1280.62
Classification modelling and preprocessing evaluation - To assess the suitability of the different spectral preprocessing pipelines described above for classification tasks, supervised classification models were trained using support vector machines (SVM) and random forests (RF). Both binary and multi-class classification problems were considered, where binary classification corresponded to discrimination by fibre origin (natural vs man-made) and multi-class classification corresponded to discrimination by fibre type. Prior to modelling, the preprocessed spectral data were subjected to dimensionality reduction using principal component analysis (PCA) to summarise spectral variability and reduce feature dimensionality. The first 10 principal components were retained, as they accounted for at least 90 % of the total variance across all preprocessing configurations, and were subsequently used as input variables for SVM and RF classification. Model hyperparameters were tuned using standard resampling procedures, and performance was evaluated using 5-fold cross-validation. Classification accuracy was calculated as the proportion of correctly classified spectra averaged across cross-validation folds (Table 2).
For reproducibility, all preprocessing parameters (including ALS and Savitzky–Golay settings) and modelling hyperparameters are summarised in Table 3.Table 3. Summary of preprocessing and modelling parameters used for baseline correction, spectral filtering, and supervised classification in this study.Table 3 dummy alt textCategoryParameterValueDescriptionALS baseline correctionλ (lambda)1 × 10⁶Smoothness parameterp0.001Asymmetry parametern_iter10Number of iterationsSavitzky–Golay filterwindow_length15Filter window sizepolyorder3Polynomial orderderiv1 or 2Derivative order (pipelines 2 and 3)SVM classifierkernelrbfRadial basis function kernelC10Regularisation parametergammascaleKernel coefficientRF classifiern_estimators200Number of treesmax_featuressqrtNumber of features per split
Software - All spectra were acquired using Spectrum software v10.4.00 by PerkinElmer. Data processing was performed in Python v3.12.0 [11] using the libraries Numpy [12], Pandas [13], SciPy [14] and Matplotlib [15] for baseline correction, spectral conversion, filtering, and data handling. All scripts used for preprocessing, model training, and performance evaluation are openly available in a GitHub repository [16]. A companion script is also included in the repository to enable users to apply the same correction and preprocessing pipeline to their own spectra, ensuring consistency with the procedures used to generate the processed datasets presented here.
Limitations
The dataset currently includes 26 fibre subtypes, covering the most common natural and man-made materials encountered in clothing and household textiles. Data collection is ongoing, and additional fibre subtypes will be incorporated over time. Although efforts were made to include variability in sample form and manufacturing treatment, the representation of some fibre categories remains more limited than others. Future updates will also expand the dataset by adding further spectra for previously sampled materials. No other major limitations were identified.
Ethics Statement
The current work does not involve human subjects, animal experiments, or any data collected from social media platforms.
CRediT authorship contribution statement
Reeha Parkar: Conceptualization, Methodology, Software, Formal analysis, Investigation, Data curation, Writing – original draft, Visualization. Angelica Jain: Investigation, Data curation. Miranda Prendergast-Miller: Resources, Data curation, Writing – original draft. Thomas Stanton: Resources, Data curation, Writing – original draft. Kelly J. Sheridan: Resources, Data curation, Writing – original draft. Matteo D. Gallidabino: Conceptualization, Methodology, Resources, Data curation, Writing – original draft, Supervision, Project administration, Funding acquisition.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Stanton T.Johnson M.Nathanail P.Mac Naughtan W.Gomes R.L.Freshwater and airborne textile fibre populations are dominated by ‘natural’, not microplastic, fibres Sci. Total Environ.666201937738910.1016/j.scitotenv.2019.02.27830798244 · doi ↗ · pubmed ↗
- 2Napper I.E.Parker-Jurd F.N.F.Wright S.L.Thompson R.C.Examining the release of synthetic microfibres to the environment via two major pathways: atmospheric deposition and treated wastewater effluent Sci. Total Environ.857202315931710.1016/j.scitotenv.2022.15931736220472 · doi ↗ · pubmed ↗
- 3Jones J.Johansson S.A population study of textile fibres on the seats at three public venues Forensic Sci. Int.348202311160410.1016/j.forsciint.2023.11160436801086 · doi ↗ · pubmed ↗
- 4Allan H.K.Fricker A.E.Hsieh Y.-L.A trace fiber population study on upholstered chairs in a military environment Forensic Sci. Int.358202411200610.1016/j.forsciint.2024.11200638547581 · doi ↗ · pubmed ↗
- 5Meleiro P.P.García-Ruiz C.Spectroscopic techniques for the forensic analysis of textile fibers Appl. Spectrosc. Rev.514201627830110.1080/05704928.2015.1132720 · doi ↗
- 6Peets P.Leito I.Pelt J.Vahur S.Identification and classification of textile fibres using ATR-FT-IR spectroscopy with chemometric methods Spectrochim. Acta A: Mol. Biomol. Spectrosc.173201717518110.1016/j.saa.2016.09.00727643467 · doi ↗ · pubmed ↗
- 7Sharma V.Mahara M.Sharma A.On the textile fibre’s analysis for forensics, utilizing FTIR spectroscopy and machine learning methods Forensic Chem.39202410057610.1016/j.forc.2024.100576 · doi ↗
- 8da Cruz Santos E.Silva A.A.B.Faria R.R.A.de Almeida Rizzutto M.Rodrigues P.H.S.Baruque-Ramos J.Raw cellulosic fibers: characterization and classification by FTIR-ATR spectroscopy and multivariate analysis (PCA and LDA)Mater. Circ. Econ.6120241310.1007/s 42824-024-00104-1 · doi ↗
