Loading paper
Learning Visual Voice Activity Detection with an Automatically Annotated Dataset | Tomesphere