Text2Stereo: Repurposing Stable Diffusion for Stereo Generation with Consistency Rewards
Aakash Garg, Libing Zeng, Andrii Tsarov, Nima Khademi Kalantari

TL;DR
This paper introduces Text2Stereo, a diffusion-based method that fine-tunes Stable Diffusion with consistency rewards to generate high-quality stereo images from text prompts, addressing dataset scarcity and improving stereo consistency.
Contribution
It presents a novel fine-tuning approach for Stable Diffusion using stereo consistency rewards to generate stereo images from text prompts.
Findings
Outperforms existing methods in stereo image quality
Achieves high stereo consistency and text alignment
Effective even with limited stereo datasets
Abstract
In this paper, we propose a novel diffusion-based approach to generate stereo images given a text prompt. Since stereo image datasets with large baselines are scarce, training a diffusion model from scratch is not feasible. Therefore, we propose leveraging the strong priors learned by Stable Diffusion and fine-tuning it on stereo image datasets to adapt it to the task of stereo generation. To improve stereo consistency and text-to-image alignment, we further tune the model using prompt alignment and our proposed stereo consistency reward functions. Comprehensive experiments demonstrate the superiority of our approach in generating high-quality stereo images across diverse scenarios, outperforming existing methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Multimodal Machine Learning Applications
