EndoSLAM Dataset and An Unsupervised Monocular Visual Odometry and Depth Estimation Approach for Endoscopic Videos: Endo-SfMLearner
Kutsev Bengisu Ozyoruk, Guliz Irem Gokceler, Gulfize Coskun, Kagan, Incetan, Yasin Almalioglu, Faisal Mahmood, Eva Curto, Luis Perdigoto, Marina, Oliveira, Hasan Sahin, Helder Araujo, Henrique Alexandrino, Nicholas J. Durr,, Hunter B. Gilbert, and Mehmet Turan

TL;DR
This paper introduces a comprehensive endoscopic SLAM dataset with diverse data types and ground truth, and proposes Endo-SfMLearner, an unsupervised deep learning method for monocular depth and pose estimation in endoscopic videos.
Contribution
The paper provides a new extensive dataset for endoscopic SLAM with ground truth and synthetic data, and develops Endo-SfMLearner, a novel unsupervised approach utilizing residual networks and attention for depth and pose estimation.
Findings
Endo-SfMLearner outperforms existing methods on the dataset.
The dataset enables effective benchmarking of endoscopic SLAM algorithms.
Synthetic data facilitates transfer learning to real endoscopic videos.
Abstract
Deep learning techniques hold promise to develop dense topography reconstruction and pose estimation methods for endoscopic videos. However, currently available datasets do not support effective quantitative benchmarking. In this paper, we introduce a comprehensive endoscopic SLAM dataset consisting of 3D point cloud data for six porcine organs, capsule and standard endoscopy recordings as well as synthetically generated data. A Panda robotic arm, two commercially available capsule endoscopes, two conventional endoscopes with different camera properties, and two high precision 3D scanners were employed to collect data from 8 ex-vivo porcine gastrointestinal (GI)-tract organs. In total, 35 sub-datasets are provided with 6D pose ground truth for the ex-vivo part: 18 sub-dataset for colon, 12 sub-datasets for stomach and 5 sub-datasets for small intestine, while four of these contain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging
MethodsConvolution · Sigmoid Activation · Average Pooling · Max Pooling · Communication--Guide||How Do I Communicate to Expedia?
