Detection and Analysis of Content Creator Collaborations in YouTube Videos using Face- and Speaker-Recognition
Moritz Lode, Michael \"Ortl, Christian Koch, Amr Rizk, Ralf Steinmetz

TL;DR
This paper enhances the detection of YouTube content creator collaborations by integrating speaker recognition and active speaker detection into the existing face recognition framework, addressing limitations with face-only methods.
Contribution
It introduces an extended framework combining face recognition with speaker recognition and active speaker detection to improve collaboration detection accuracy in videos.
Findings
Improved detection accuracy over face-only methods
Effective integration of speaker recognition with face recognition
Addresses limitations in videos without visible faces
Abstract
This work discusses and implements the application of speaker recognition for the detection of collaborations in YouTube videos. CATANA, an existing framework for detection and analysis of YouTube collaborations, is utilizing face recognition for the detection of collaborators, which naturally performs poor on video-content without appearing faces. This work proposes an extension of CATANA using active speaker detection and speaker recognition to improve the detection accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Video Analysis and Summarization
