Stereo Video Reconstruction Without Explicit Depth Maps for Endoscopic Surgery
Annika Brundyn, Jesse Swanson, Kyunghyun Cho, Doug Kondziolka, Eric, Oermann

TL;DR
This paper presents a deep learning approach for stereo video reconstruction in endoscopic surgery, enabling 3D visualization without explicit depth maps, and demonstrates its effectiveness through expert evaluations.
Contribution
The study introduces a novel U-Net-based method that leverages multiple frames for stereo reconstruction, validated by expert surgeon assessments and correlation with automatic metrics.
Findings
Multiple frames improve stereo reconstruction quality.
Surgeons perceive depth effectively from reconstructed 3D videos.
Automatic metrics LPIPS and DISTS correlate with expert judgment.
Abstract
We introduce the task of stereo video reconstruction or, equivalently, 2D-to-3D video conversion for minimally invasive surgical video. We design and implement a series of end-to-end U-Net-based solutions for this task by varying the input (single frame vs. multiple consecutive frames), loss function (MSE, MAE, or perceptual losses), and network architecture. We evaluate these solutions by surveying ten experts - surgeons who routinely perform endoscopic surgery. We run two separate reader studies: one evaluating individual frames and the other evaluating fully reconstructed 3D video played on a VR headset. In the first reader study, a variant of the U-Net that takes as input multiple consecutive video frames and outputs the missing view performs best. We draw two conclusions from this outcome. First, motion information coming from multiple past frames is crucial in recreating stereo…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Image Enhancement Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · Concatenated Skip Connection · U-Net
