Surgical Text-to-Image Generation
Chinedu Innocent Nwoye, Rupak Bose, Kareem Elgohary, Lorenzo Arboit,, Giorgio Carlino, Jo\"el L. Lavanchy, Pietro Mascagni, Nicolas Padoy

TL;DR
This paper adapts text-to-image generative models for surgical images using triplet-based captions, achieving high-quality, realistic images that could reduce reliance on costly real data collection.
Contribution
It introduces Surgical Imagen, a diffusion-based model tailored for surgical data, with novel triplet-based caption handling and class balancing techniques for improved image synthesis.
Findings
Achieved FID score of 3.7 indicating high image quality.
Human experts rated generated images as highly realistic.
Model effectively aligns images with complex surgical action descriptions.
Abstract
Acquiring surgical data for research and development is significantly hindered by high annotation costs and practical and ethical constraints. Utilizing synthetically generated images could offer a valuable alternative. In this work, we explore adapting text-to-image generative models for the surgical domain using the CholecT50 dataset, which provides surgical images annotated with action triplets (instrument, verb, target). We investigate several language models and find T5 to offer more distinct features for differentiating surgical actions on triplet-based textual inputs, and showcasing stronger alignment between long and triplet-based captions. To address challenges in training text-to-image models solely on triplet-based captions without additional inputs and supervisory signals, we discover that triplet text embeddings are instrument-centric in the latent space. Leveraging this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Digital Imaging in Medicine
MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Linear Layer · Dropout · SentencePiece · Multi-Head Attention · Dense Connections · Softmax
