Metasurface-generated large and arbitrary analog convolution kernels for accelerated machine vision
Ruiqi Liang, Shuai Wang, Yiying Dong, Liu Li, Ying Kuang, Bohan Zhang, and Yuanmu Yang

TL;DR
This paper introduces a novel optical metasurface-based analog convolution method that significantly accelerates machine vision tasks by enabling large, arbitrary convolution kernels with high accuracy, surpassing digital methods in speed and efficiency.
Contribution
It presents a spatial frequency domain training approach to create customizable optical convolution kernels using metasurfaces, demonstrating high classification accuracy on standard datasets.
Findings
Achieved 98.59% accuracy on MNIST
Simulations show 92.63% on Fashion-MNIST
Demonstrated large, arbitrary kernels surpassing digital counterparts
Abstract
In the rapidly evolving field of artificial intelligence, convolutional neural networks are essential for tackling complex challenges such as machine vision and medical diagnosis. Recently, to address the challenges in processing speed and power consumption of conventional digital convolution operations, many optical components have been suggested to replace the digital convolution layer in the neural network, accelerating various machine vision tasks. Nonetheless, the analog nature of the optical convolution kernel has not been fully explored. Here, we develop a spatial frequency domain training method to create arbitrarily shaped analog convolution kernels using an optical metasurface as the convolution layer, with its receptive field largely surpassing digital convolution kernels. By employing spatial multiplexing, the multiple parallel convolution kernels with both positive and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Metamaterials and Metasurfaces Applications · CCD and CMOS Imaging Sensors
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Convolution
