FocDepthFormer: Transformer with latent LSTM for Depth Estimation from   Focal Stack

Xueyang Kang; Fengze Han; Abdur R. Fayjie; Patrick Vandewalle; Kourosh; Khoshelham; Dong Gong

arXiv:2310.11178·cs.CV·December 5, 2024·2 cites

FocDepthFormer: Transformer with latent LSTM for Depth Estimation from Focal Stack

Xueyang Kang, Fengze Han, Abdur R. Fayjie, Patrick Vandewalle, Kourosh, Khoshelham, Dong Gong

PDF

Open Access

TL;DR

FocDepthFormer introduces a Transformer-LSTM hybrid model for depth estimation from focal stacks, enabling flexible stack length processing and improved accuracy over existing CNN-based methods.

Contribution

The paper proposes a novel Transformer-based network with an LSTM module for depth estimation from focal stacks, overcoming fixed stack length limitations of prior CNN methods.

Findings

01

Outperforms state-of-the-art methods on multiple benchmarks

02

Can be pre-trained on monocular RGB datasets to enhance performance

03

Effectively processes focal stacks of arbitrary length

Abstract

Most existing methods for depth estimation from a focal stack of images employ convolutional neural networks (CNNs) using 2D or 3D convolutions over a fixed set of images. However, their effectiveness is constrained by the local properties of CNN kernels, which restricts them to process only focal stacks of fixed number of images during both training and inference. This limitation hampers their ability to generalize to stacks of arbitrary lengths. To overcome these limitations, we present a novel Transformer-based network, FocDepthFormer, which integrates a Transformer with an LSTM module and a CNN decoder. The Transformer's self-attention mechanism allows for the learning of more informative spatial features by implicitly performing non-local cross-referencing. The LSTM module is designed to integrate representations across image stacks of varying lengths. Additionally, we employ…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · Advanced Vision and Imaging · Cell Image Analysis Techniques

MethodsSparse Evolutionary Training · Multi-Head Attention · Attention Is All You Need · Dense Connections · Linear Layer · Softmax · Residual Connection · Absolute Position Encodings · Layer Normalization · Adam