Loading paper
FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion model | Tomesphere