OpenPros: A Large-Scale Dataset for Limited View Prostate Ultrasound Computed Tomography
Hanchen Wang, Yixuan Wu, Yinan Feng, Peng Jin, Luoyuan Zhang, Shihang Feng, James Wiskin, Baris Turkbey, Peter A. Pinto, Bradford J. Wood, Songting Luo, Yinpeng Chen, Emad Boctor, and Youzuo Lin

TL;DR
OPENPROS is a comprehensive large-scale dataset for limited-view prostate ultrasound computed tomography, enabling systematic evaluation of machine learning methods for improved tissue parameter reconstruction in prostate cancer imaging.
Contribution
The paper introduces OPENPROS, the first large-scale, realistic dataset for prostate USCT, facilitating research in inverse problems and machine learning applications.
Findings
Deep learning improves speed and accuracy over physics-based methods.
Challenges remain in robustness and high-resolution reconstruction.
Benchmark results highlight the need for further research in generalization.
Abstract
Prostate cancer is one of the most prevalent and deadly cancers among men, motivating the development of accurate and accessible imaging technologies for early detection. Ultrasound computed tomography (USCT) reconstructs quantitative tissue parameters such as speed-of-sound (SOS) and is a promising low-cost alternative to existing modalities. However, prostate USCT remains challenging due to limited-angle acquisition, strong tissue heterogeneity, bone-induced wave distortion, and the lack of large-scale, anatomically realistic datasets for method development and evaluation. We introduce OPENPROS, the first large-scale benchmark dataset for limited-angle prostate USCT, designed to systematically evaluate machine learning methods for quantitative inverse problems. OPENPROS contains over 280,000 paired samples of realistic 2D SOS maps and corresponding ultrasound full-waveform data,…
Peer Reviews
Decision·ICLR 2026 Poster
1. Addresses a clinically important and computationally demanding imaging problem 2. Provides an open and standardized resource that can make it easier for other people to design and evaluate their own models 3. The selected benchmarks are reasonable and cover both physics-based and learned approaches 4. The paper is technically good and the simulation details are sufficient, I think, if someone wants to reproduce the results
My main concern: the dataset is based on only four patient anatomies and this is mentioned only deep in the methods and appendix (I think it should be mentioned in the abstract or, worst-case, introduction). The anatomical diversity is therefore extremely limited! - All simulations are 2D and reconstruct only SOS, ignoring attenuation and density. - The machine-learning component is incremental (standard CNN and ViT baselines) without any more recent architectures or hybrid physics-learning in
- The paper is very well written and justified - The paper does a good job of designing tasks for the dataset that could potentially determine its efficacy. The areas of (1) inference efficiency, (2) reconstruction accuracy, (3) and out-of-distribution generalization are all interesting and important. - The paper does a good job of comparing to the USCT literature, including deep learning methods for image reconstruction.
- The dataset size is a severe limitation in my opinion. It is not at all clear why 280K number of samples are needed for machine learning training. Ablation experiments controlling the number of generated samples could help here in showing the increased benefit from more samples, justifying the approach. - The metrics in Table 5 are not given enough context. Figure 5 offers some insight into potential benefits of data-driven methods, but this could be better quantified using the annotations. -
1. This is the first public, large-scale dataset specifically for limited-angle prostate USCT. 2. The method of creating phantoms by combining real MRI/CT anatomical structures with ex vivo SOS measurements is a key strength, ensuring high anatomical realism 3. Providing the FDTD and Runge-Kutta solvers is a valuable contribution that enhances reproducibility and enables future work. 4. The authors are transparent about the failures of current DL methods, particularly in resolving fine detai
1. The most significant weakness is that the 280,000 samples are derived from the anatomical structures of only 4 patients. This homogeneity means models may simply overfit to these 4 anatomical configurations, and the "patient-level" OOD test (train on 3, test on 1) is statistically insufficient to make strong claims about generalization. 2. The paper is heavily motivated by cancer detection and claims the dataset includes "synthetic lesions". However, the entire benchmark and results section
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUltrasound Imaging and Elastography · Ultrasound and Hyperthermia Applications · Photoacoustic and Ultrasonic Imaging
