TL;DR
This paper introduces BCAF, a novel bidirectional cross-attention fusion method that effectively combines high-res RGB and low-res hyperspectral images for improved automated waste sorting, achieving state-of-the-art results.
Contribution
BCAF is a modality-agnostic fusion approach that aligns RGB and hyperspectral data at their native resolutions using localized cross-attention, avoiding pre-upsampling.
Findings
BCAF achieves 76.4% mIoU on SpectralWaste dataset.
BCAF reaches 62.3% mIoU on industrial K3I-Cycling dataset.
BCAF operates at 31 to 55 images per second.
Abstract
Growing waste streams and the transition to a circular economy require efficient automated waste sorting. In industrial settings, materials move on fast conveyor belts, where reliable identification and ejection demand pixel-accurate segmentation. RGB imaging delivers high-resolution spatial detail, which is essential for accurate segmentation, but it confuses materials that look similar in the visible spectrum. Hyperspectral imaging (HSI) provides spectral signatures that separate such materials, yet its lower spatial resolution limits detail. Effective waste sorting therefore needs methods that fuse both modalities to exploit their complementary strengths. We present Bidirectional Cross-Attention Fusion (BCAF), which aligns high-resolution RGB with low-resolution HSI at their native grids via localized, bidirectional cross-attention, avoiding pre-upsampling or early spectral collapse.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
