PrAViC: Probabilistic Adaptation Framework for Real-Time Video Classification

Magdalena Tr\k{e}dowicz; Marcin Mazur; Szymon Janusz; Arkadiusz Lewicki; Jacek Tabor; {\L}ukasz Struski

arXiv:2406.11443·cs.CV·August 14, 2025·1 cites

PrAViC: Probabilistic Adaptation Framework for Real-Time Video Classification

Magdalena Tr\k{e}dowicz, Marcin Mazur, Szymon Janusz, Arkadiusz Lewicki, Jacek Tabor, {\L}ukasz Struski

PDF

Open Access 3 Reviews

TL;DR

PrAViC introduces a unified, theoretically-grounded framework for real-time video classification that enables faster decision-making without sacrificing accuracy by adapting offline models for online use.

Contribution

It provides a novel mathematical foundation and a simple method for adapting offline video classification models to online, real-time scenarios.

Findings

01

Significantly reduces decision time in online video classification

02

Maintains or improves accuracy compared to existing methods

03

Offers a straightforward implementation for real-time adaptation

Abstract

Video processing is generally divided into two main categories: processing of the entire video, which typically yields optimal classification outcomes, and real-time processing, where the objective is to make a decision as promptly as possible. Although the models dedicated to the processing of entire videos are typically well-defined and clearly presented in the literature, this is not the case for online processing, where a~plethora of hand-devised methods exist. To address this issue, we present PrAViC, a novel, unified, and theoretically-based adaptation framework for tackling the online classification problem in video data. The initial phase of our study is to establish a mathematical background for the classification of sequential data, with the potential to make a decision at an early stage. This allows us to construct a natural function that encourages the model to return a…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 3

Strengths

1. The propose solution is reasonable and the paper is well written 2. The experiment results are impressive, which may help for other sub-sequent tasks

Weaknesses

1. more baselines should be added to validate the effectiveness of the proposed approach. For example, some other on-line mechanisms, even the simple threshold one. 2. Some discussion should be added to explain the intuition of the method, and more ablation study should be added to validate the findings, and the contributions.

Reviewer 02Rating 3Confidence 4

Strengths

- The related work section is well presented and easy to read - Section 3, up to “Expected time of early exit”, gives a clear overview of the problem that the paper wants to address

Weaknesses

- the paper is generally not clear from section 3 on - the proposed approach is never well defined. a custom loss function is proposed, but it is never said how to then use the model to do inference with early exit. Instead the paper would benefit from a comparison between a baseline model, with a fixed threshold, and the proposed finetuned ones, to compare the time the two models take to classify the video as 0/1 above the significance threshold picked - the theoretical part is highly confusi

Reviewer 03Rating 5Confidence 3

Strengths

- Good and helpful images. - The probabilistic adaptation framework is a clever idea to handle the need for classification as fast as possible. - The architectural modifications are simple and easy to implement in any offline CNN structure. - The experiments cover a sufficient range of datasets and relevant 3D-CNN models to show the effectiveness of the proposed framework. - Comparison with online and offline models. - Another method for online video classification is used for comparison.

Weaknesses

- The paper does not mention the SOTA models for video classification and action recognition, such as VideoMAE and other transformer-based architectures. Transformer models are a well-established SOTA, especially for some of the evaluated datasets. - The paper discusses only 3D-CNNs and does not clarify such a limitation in the abstract or contributions. The framework's adaptations do not include different architectures, such as transformers or even CNN-LSTM. Papers: VideoMAE V2: Scaling Video

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods · Human Pose and Action Recognition