Bora: Biomedical Generalist Video Generation Model
Weixiang Sun, Xiaocao You, Ruizhe Zheng, Zhengqing Yuan, Xiang Li,, Lifang He, Quanzheng Li, Lichao Sun

TL;DR
Bora is a pioneering spatio-temporal diffusion model that generates high-quality, diverse biomedical videos from text prompts, aiding medical training, decision-making, and data augmentation.
Contribution
It introduces Bora, the first comprehensive biomedical video generation model using a Transformer-based diffusion approach with a new annotated medical video dataset.
Findings
Effective across endoscopy, ultrasound, MRI, and cell tracking modalities.
Outperforms existing models in biomedical video generation tasks.
Demonstrates potential for medical education and clinical decision support.
Abstract
Generative models hold promise for revolutionizing medical education, robot-assisted surgery, and data augmentation for medical AI development. Diffusion models can now generate realistic images from text prompts, while recent advancements have demonstrated their ability to create diverse, high-quality videos. However, these models often struggle with generating accurate representations of medical procedures and detailed anatomical structures. This paper introduces Bora, the first spatio-temporal diffusion probabilistic model designed for text-guided biomedical video generation. Bora leverages Transformer architecture and is pre-trained on general-purpose video generation tasks. It is fine-tuned through model alignment and instruction tuning using a newly established medical video corpus, which includes paired text-video data from various biomedical fields. To the best of our knowledge,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Generative Adversarial Networks and Image Synthesis · Computational Physics and Python Applications
MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Linear Layer · Label Smoothing · Diffusion · Adam · Dropout · Multi-Head Attention · Dense Connections
