STEC: A Reference-Free Spatio-Temporal Entropy Coverage Metric for Evaluating Sampled Video Frames
Shih-Yao Lin

TL;DR
STEC is a new, reference-free metric that evaluates the quality of sampled video frames by measuring their spatio-temporal information coverage, helping to improve video understanding tasks.
Contribution
We introduce STEC, a lightweight, non-reference metric that assesses sampled video frames based on spatial information, temporal coverage, and redundancy, filling a gap in evaluation methods.
Findings
STEC effectively differentiates various sampling strategies.
It reveals robustness patterns not captured by average performance.
STEC serves as a diagnostic tool for frame sampling quality.
Abstract
Frame sampling is a fundamental component in video understanding and video--language model pipelines, yet evaluating the quality of sampled frames remains challenging. Existing evaluation metrics primarily focus on perceptual quality or reconstruction fidelity, and are not designed to assess whether a set of sampled frames adequately captures informative and representative video content. We propose Spatio-Temporal Entropy Coverage (STEC), a simple and non-reference metric for evaluating the effectiveness of video frame sampling. STEC builds upon Spatio-Temporal Frame Entropy (STFE), which measures per-frame spatial information via entropy-based structural complexity, and evaluates sampled frames based on their temporal coverage and redundancy. By jointly modeling spatial information strength, temporal dispersion, and non-redundancy, STEC provides a principled and lightweight measure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Human Pose and Action Recognition
