LUM-ViT: Learnable Under-sampling Mask Vision Transformer for Bandwidth   Limited Optical Signal Acquisition

Lingfeng Liu; Dong Ni; Hangjie Yuan

arXiv:2403.01412·cs.CV·March 5, 2024·1 cites

LUM-ViT: Learnable Under-sampling Mask Vision Transformer for Bandwidth Limited Optical Signal Acquisition

Lingfeng Liu, Dong Ni, Hangjie Yuan

PDF

Open Access 1 Repo

TL;DR

LUM-ViT introduces a learnable under-sampling mask within a Vision Transformer framework to enable efficient optical signal acquisition with minimal data, maintaining high accuracy even with significant data reduction.

Contribution

The paper presents a novel learnable under-sampling mask integrated into a Vision Transformer for pre-acquisition modulation, optimized for optical hardware implementation.

Findings

01

Sampling 10% of pixels retains within 1.8% accuracy loss on ImageNet.

02

Maintains near-original accuracy on real-world optical hardware.

03

Proposes kernel-level weight binarization and a three-stage fine-tuning strategy.

Abstract

Bandwidth constraints during signal acquisition frequently impede real-time detection applications. Hyperspectral data is a notable example, whose vast volume compromises real-time hyperspectral detection. To tackle this hurdle, we introduce a novel approach leveraging pre-acquisition modulation to reduce the acquisition volume. This modulation process is governed by a deep learning model, utilizing prior information. Central to our approach is LUM-ViT, a Vision Transformer variant. Uniquely, LUM-ViT incorporates a learnable under-sampling mask tailored for pre-acquisition modulation. To further optimize for optical calculations, we propose a kernel-level weight binarization technique and a three-stage fine-tuning strategy. Our evaluations reveal that, by sampling a mere 10% of the original image pixels, LUM-ViT maintains the accuracy loss within 1.8% on the ImageNet classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maxllf/lum-vit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhotonic and Optical Devices · Advanced optical system design · Optical Systems and Laser Technology

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Dropout · Multi-Head Attention · Softmax · Dense Connections · Label Smoothing · Adam