Loading paper
An Intermediate Fusion ViT Enables Efficient Text-Image Alignment in Diffusion Models | Tomesphere