FOCUS: Fused Observation of Channels for Unveiling Spectra
Xi Xiao, Aristeidis Tsaris, Anika Tabassum, John Lagergren, Larry M. York, Tianyang Wang, Xiao Wang

TL;DR
FOCUS is a novel framework that enhances the interpretability of Vision Transformers in hyperspectral imaging by generating stable, spectral-aware saliency maps efficiently without modifying the model.
Contribution
It introduces class-specific spectral prompts and a learnable [SINK] token to improve spectral interpretability and stability in frozen ViTs for hyperspectral data.
Findings
Increases band-level IoU by 15%
Reduces attention collapse by over 40%
Produces saliency maps aligning with expert annotations
Abstract
Hyperspectral imaging (HSI) captures hundreds of narrow, contiguous wavelength bands, making it a powerful tool in biology, agriculture, and environmental monitoring. However, interpreting Vision Transformers (ViTs) in this setting remains largely unexplored due to two key challenges: (1) existing saliency methods struggle to capture meaningful spectral cues, often collapsing attention onto the class token, and (2) full-spectrum ViTs are computationally prohibitive for interpretability, given the high-dimensional nature of HSI data. We present FOCUS, the first framework that enables reliable and efficient spatial-spectral interpretability for frozen ViTs. FOCUS introduces two core components: class-specific spectral prompts that guide attention toward semantically meaningful wavelength groups, and a learnable [SINK] token trained with an attraction loss to absorb noisy or redundant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
