Convolutional Neural Network (CNN) vs Vision Transformer (ViT) for Digital Holography
St\'ephane Cuenat, Rapha\"el Couturier

TL;DR
This paper compares CNN and ViT deep learning architectures for auto-focusing in digital holography, demonstrating that ViT achieves high accuracy and robustness with finer distance classification than previous methods.
Contribution
It introduces a novel application of ViT for auto-focusing in digital holography, achieving 1μm classification granularity and improved robustness over CNN.
Findings
ViT achieves similar accuracy to CNN in auto-focusing.
ViT is more robust than CNN.
The classification granularity is improved to 1μm.
Abstract
In Digital Holography (DH), it is crucial to extract the object distance from a hologram in order to reconstruct its amplitude and phase. This step is called auto-focusing and it is conventionally solved by first reconstructing a stack of images and then by sharpening each reconstructed image using a focus metric such as entropy or variance. The distance corresponding to the sharpest image is considered the focal position. This approach, while effective, is computationally demanding and time-consuming. In this paper, the determination of the distance is performed by Deep Learning (DL). Two deep learning (DL) architectures are compared: Convolutional Neural Network (CNN) and Vision Transformer (ViT). ViT and CNN are used to cope with the problem of auto-focusing as a classification problem. Compared to a first attempt [11] in which the distance between two consecutive classes was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Digital Holography and Microscopy · Cell Image Analysis Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Label Smoothing · Residual Connection · Layer Normalization · Dense Connections · Adam · Absolute Position Encodings
