Metasurface-generated large and arbitrary analog convolution kernels for   accelerated machine vision

Ruiqi Liang; Shuai Wang; Yiying Dong; Liu Li; Ying Kuang; Bohan Zhang; and Yuanmu Yang

arXiv:2409.18614·physics.optics·September 30, 2024

Metasurface-generated large and arbitrary analog convolution kernels for accelerated machine vision

Ruiqi Liang, Shuai Wang, Yiying Dong, Liu Li, Ying Kuang, Bohan Zhang, and Yuanmu Yang

PDF

Open Access

TL;DR

This paper introduces a novel optical metasurface-based analog convolution method that significantly accelerates machine vision tasks by enabling large, arbitrary convolution kernels with high accuracy, surpassing digital methods in speed and efficiency.

Contribution

It presents a spatial frequency domain training approach to create customizable optical convolution kernels using metasurfaces, demonstrating high classification accuracy on standard datasets.

Findings

01

Achieved 98.59% accuracy on MNIST

02

Simulations show 92.63% on Fashion-MNIST

03

Demonstrated large, arbitrary kernels surpassing digital counterparts

Abstract

In the rapidly evolving field of artificial intelligence, convolutional neural networks are essential for tackling complex challenges such as machine vision and medical diagnosis. Recently, to address the challenges in processing speed and power consumption of conventional digital convolution operations, many optical components have been suggested to replace the digital convolution layer in the neural network, accelerating various machine vision tasks. Nonetheless, the analog nature of the optical convolution kernel has not been fully explored. Here, we develop a spatial frequency domain training method to create arbitrarily shaped analog convolution kernels using an optical metasurface as the convolution layer, with its receptive field largely surpassing digital convolution kernels. By employing spatial multiplexing, the multiple parallel convolution kernels with both positive and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Metamaterials and Metasurfaces Applications · CCD and CMOS Imaging Sensors

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Convolution