FocDepthFormer: Transformer with latent LSTM for Depth Estimation from Focal Stack
Xueyang Kang, Fengze Han, Abdur R. Fayjie, Patrick Vandewalle, Kourosh, Khoshelham, Dong Gong

TL;DR
FocDepthFormer introduces a Transformer-LSTM hybrid model for depth estimation from focal stacks, enabling flexible stack length processing and improved accuracy over existing CNN-based methods.
Contribution
The paper proposes a novel Transformer-based network with an LSTM module for depth estimation from focal stacks, overcoming fixed stack length limitations of prior CNN methods.
Findings
Outperforms state-of-the-art methods on multiple benchmarks
Can be pre-trained on monocular RGB datasets to enhance performance
Effectively processes focal stacks of arbitrary length
Abstract
Most existing methods for depth estimation from a focal stack of images employ convolutional neural networks (CNNs) using 2D or 3D convolutions over a fixed set of images. However, their effectiveness is constrained by the local properties of CNN kernels, which restricts them to process only focal stacks of fixed number of images during both training and inference. This limitation hampers their ability to generalize to stacks of arbitrary lengths. To overcome these limitations, we present a novel Transformer-based network, FocDepthFormer, which integrates a Transformer with an LSTM module and a CNN decoder. The Transformer's self-attention mechanism allows for the learning of more informative spatial features by implicitly performing non-local cross-referencing. The LSTM module is designed to integrate representations across image stacks of varying lengths. Additionally, we employ…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Advanced Vision and Imaging · Cell Image Analysis Techniques
MethodsSparse Evolutionary Training · Multi-Head Attention · Attention Is All You Need · Dense Connections · Linear Layer · Softmax · Residual Connection · Absolute Position Encodings · Layer Normalization · Adam
