Stereo Video Reconstruction Without Explicit Depth Maps for Endoscopic   Surgery

Annika Brundyn; Jesse Swanson; Kyunghyun Cho; Doug Kondziolka; Eric; Oermann

arXiv:2109.08227·eess.IV·September 20, 2021

Stereo Video Reconstruction Without Explicit Depth Maps for Endoscopic Surgery

Annika Brundyn, Jesse Swanson, Kyunghyun Cho, Doug Kondziolka, Eric, Oermann

PDF

Open Access

TL;DR

This paper presents a deep learning approach for stereo video reconstruction in endoscopic surgery, enabling 3D visualization without explicit depth maps, and demonstrates its effectiveness through expert evaluations.

Contribution

The study introduces a novel U-Net-based method that leverages multiple frames for stereo reconstruction, validated by expert surgeon assessments and correlation with automatic metrics.

Findings

01

Multiple frames improve stereo reconstruction quality.

02

Surgeons perceive depth effectively from reconstructed 3D videos.

03

Automatic metrics LPIPS and DISTS correlate with expert judgment.

Abstract

We introduce the task of stereo video reconstruction or, equivalently, 2D-to-3D video conversion for minimally invasive surgical video. We design and implement a series of end-to-end U-Net-based solutions for this task by varying the input (single frame vs. multiple consecutive frames), loss function (MSE, MAE, or perceptual losses), and network architecture. We evaluate these solutions by surveying ten experts - surgeons who routinely perform endoscopic surgery. We run two separate reader studies: one evaluating individual frames and the other evaluating fully reconstructed 3D video played on a VR headset. In the first reader study, a variant of the U-Net that takes as input multiple consecutive video frames and outputs the missing view performs best. We draw two conclusions from this outcome. First, motion information coming from multiple past frames is crucial in recreating stereo…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Image Enhancement Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · Concatenated Skip Connection · U-Net