# Multimodal and Hyperspectral Dataset for Segmentation of Bulky Waste using VIS, IR, NIR, and Terahertz Imaging

**Authors:** Manuel Bihler, Lukas Roming, Dovilė Čibiraitė-Lukenskienė, Jochen Aderhold, Andreas Keil, Friedrich Schlüter, Robin Gruna, Michael Heizmann

PMC · DOI: 10.1038/s41597-026-07053-1 · 2026-03-27

## TL;DR

This paper introduces a comprehensive dataset combining multiple imaging techniques to improve the classification and segmentation of bulky waste, especially distinguishing wood from non-wood materials.

## Contribution

The novel contribution is a publicly available multimodal dataset with VIS, IR, NIR, and THz imaging for waste segmentation, including detailed annotations and benchmark tasks.

## Key findings

- The dataset includes 56 registered scenes with 22,659 annotated patches for binary and subclass segmentation tasks.
- Baseline performance using CNNs and fusion architectures is reported to establish reference metrics for future work.
- The dataset includes challenging scenarios like occlusions and embedded metals to encourage robust multimodal approaches.

## Abstract

This study presents an annotated multi-sensor, multimodal, and hyperspectral dataset designed to support deep learning-based classification and segmentation of bulky waste. The dataset comprises four distinct sensor modalities: high-resolution visible RGB images (VIS), hyperspectral near-infrared (NIR), temporally resolved thermal infrared (IR), and terahertz (THz) imaging with depth information, providing complementary multimodal information. An image registration process aligns all modalities to a common reference frame, enabling near pixel-precise fusion across sensors. WoodVIT contains 56 registered multi-sensor scenes, partitioned into 22,659 annotated patches with two main classes (wood and non-wood) and 16 subclass labels. It includes pixel-masks and patch-wise annotations to facilitate both segmentation and classification tasks. The primary benchmark task is binary discrimination of wood versus non-wood. The dataset also includes challenging scenarios involving occlusion and concealed contaminants (e.g., embedded metals) to motivate robust multimodal fusion approaches. We provide predefined train/validation/test splits and report baseline results using convolutional neural networks and fusion architectures to establish reference performance. WoodVIT is publicly available to support research on multi-sensor learning for waste sorting.

## Full-text entities

- **Diseases:** GT (MESH:D007815)
- **Chemicals:** halogen (MESH:D006219), cellulose (MESH:D002482), Metal (MESH:D008670), WoodVIT (-), polymers (MESH:D011108)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13035887/full.md

---
Source: https://tomesphere.com/paper/PMC13035887