Audio-Visual Evaluation of Oratory Skills
Tzvi Michelson, Shmuel Peleg

TL;DR
This paper investigates how facial expressions, gestures, and vocal features influence the success of talks, using neural networks trained on TED Talks to predict success based solely on oratory skills.
Contribution
It introduces a neural network approach that automatically learns relevant oratory features from videos, predicting talk success without relying on expert annotations.
Findings
Oratory skills significantly impact talk success.
Neural network effectively assesses facial, gestural, and vocal cues.
Automatic learning surpasses hand-crafted annotation methods.
Abstract
What makes a talk successful? Is it the content or the presentation? We try to estimate the contribution of the speaker's oratory skills to the talk's success, while ignoring the content of the talk. By oratory skills we refer to facial expressions, motions and gestures, as well as the vocal features. We use TED Talks as our dataset, and measure the success of each talk by its view count. Using this dataset we train a neural network to assess the oratory skills in a talk through three factors: body pose, facial expressions, and acoustic features. Most previous work on automatic evaluation of oratory skills uses hand-crafted expert annotations for both the quality of the talk and for the identification of predefined actions. Unlike prior art, we measure the quality to be equivalent to the view count of the talk as counted by TED, and allow the network to automatically learn the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
