Loading paper
Putting a Face to the Voice: Fusing Audio and Visual Signals Across a Video to Determine Speakers | Tomesphere