Loading paper
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment | Tomesphere