PRISM: Perceptual Recognition for Identifying Standout Moments in Human-Centric Keyframe Extraction
Mert Can Cakmak, Nitin Agarwal, Diwash Poudel

TL;DR
PRISM is a lightweight, interpretable, and perceptually-aligned keyframe extraction framework that efficiently identifies impactful moments in videos, aiding content moderation and summarization without relying on deep learning.
Contribution
It introduces a novel perceptual color difference-based method for keyframe extraction that is training-free, computationally efficient, and effective across diverse video datasets.
Findings
Achieves high accuracy and fidelity in keyframe extraction
Maintains high compression ratios across datasets
Effective in both structured and unstructured video content
Abstract
Online videos play a central role in shaping political discourse and amplifying cyber social threats such as misinformation, propaganda, and radicalization. Detecting the most impactful or "standout" moments in video content is crucial for content moderation, summarization, and forensic analysis. In this paper, we introduce PRISM (Perceptual Recognition for Identifying Standout Moments), a lightweight and perceptually-aligned framework for keyframe extraction. PRISM operates in the CIELAB color space and uses perceptual color difference metrics to identify frames that align with human visual sensitivity. Unlike deep learning-based approaches, PRISM is interpretable, training-free, and computationally efficient, making it well suited for real-time and resource-constrained environments. We evaluate PRISM on four benchmark datasets: BBC, TVSum, SumMe, and ClipShots, and demonstrate that it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
