Loading paper
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation | Tomesphere