# Introducing OpenTextile-NIR: Near-infrared hyperspectral imaging and photography dataset for optical identification of textiles

**Authors:** Tuomas Sormunen, Ella Mahlamäki, Satu-Marja Mäkelä, Mikko Mäkelä

PMC · DOI: 10.1016/j.dib.2026.112559 · 2026-02-09

## TL;DR

This paper introduces an open-access dataset combining near-infrared hyperspectral imaging and RGB photography to help identify and sort textiles using optical methods.

## Contribution

The first open-access NIR hyperspectral dataset for textile optical identification, with annotated spectra and metadata to support machine learning and recycling research.

## Key findings

- The dataset includes over 11 million spectra and 6 million annotated spectra from 71 textile samples.
- It supports machine learning, image segmentation, and spectral classification for textile recycling applications.
- The dataset addresses limitations of prior research by providing a large, publicly accessible resource.

## Abstract

This dataset presents the first open-access collection of near-infrared hyperspectral imaging (NIR-HSI) data for the optical identification of textiles, with a focus on supporting research in sensor-based textile sorting and recycling. The dataset comprises hyperspectral images, RGB photographs, and detailed metadata, including fibre composition and colour, for 71 post-industrial textile samples, collected in Finland. Over 11 million spectra are included in the hyperspectral images, with more than 6 million annotated, providing a robust foundation for machine learning and data analysis. In addition, we provide a single representative NIR spectra and RGB value for each sample in order to accommodate classic spectroscopic analysis.

Used garments were sourced from a partner company specializing in end-of-life textile management, with ground truth information on fibre composition obtained from suppliers. Small pieces of each garment were measured using Specim SWIR 3 hyperspectral camera and photographed with high-resolution mobile phone camera (Samsung Galaxy A52). The dataset is organized into folders containing raw and processed data, including ENVI-format hyperspectral images, RGB images, as well as CSV files with mean spectra, mean RGB values, and sample metadata. An example Python script is provided to facilitate data access and processing.

Potential reuse scenarios include classification of textiles by material or colour, prediction of natural fibre content, image segmentation, algorithm development for spectral classification, and use as a reference spectral library. The dataset’s comprehensive structure and open availability address the limitations of previous research, which often relied on small or non-public datasets, and is intended to accelerate advances in optical identification technologies for textile recycling.

## Full-text entities

- **Chemicals:** Carbon fibre (MESH:D000077482), Polyamide (MESH:D009757), Polyester (MESH:D011091), ISO 50 (-), Elastane (MESH:D011140), cellulose (MESH:D002482)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** L58W, A525F

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12925456/full.md

---
Source: https://tomesphere.com/paper/PMC12925456