HighlightMe: Detecting Highlights from Human-Centric Videos
Uttaran Bhattacharya, Gang Wu, Stefano Petrangeli and, Viswanathan Swaminathan, Dinesh Manocha

TL;DR
HighlightMe is a novel method that automatically detects engaging segments in human-centric videos using graph-based representations of poses and faces, outperforming existing approaches without user preferences.
Contribution
It introduces a domain- and user-preference-agnostic approach utilizing graph convolutions and autoencoders to identify highlights based on activity and interaction representations.
Findings
Achieves 4-12% higher mean average precision over state-of-the-art methods.
Operates without user preferences or dataset-specific fine-tuning.
Validated on four benchmark datasets with consistent improvements.
Abstract
We present a domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos. Our method works on the graph-based representation of multiple observable human-centric modalities in the videos, such as poses and faces. We use an autoencoder network equipped with spatial-temporal graph convolutions to detect human activities and interactions based on these modalities. We train our network to map the activity- and interaction-based latent structural representations of the different modalities to per-frame highlight scores based on the representativeness of the frames. We use these scores to compute which frames to highlight and stitch contiguous frames to produce the excerpts. We train our network on the large-scale AVA-Kinetics action dataset and evaluate it on four benchmark video highlight datasets: DSH, TVSum, PHD2, and SumMe. We observe a 4-12%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
