# Spatiotemporal deep learning framework for predictive behavioral threat detection in surveillance footage

**Authors:** Asha Aruna Sheela Matta, Venkata Purna Chandra Sekhara Rao Manukonda

PMC · DOI: 10.3389/fdata.2026.1770989 · 2026-02-27

## TL;DR

This paper introduces a deep learning framework that combines CNN and LSTM to detect threatening behaviors in surveillance videos with high accuracy.

## Contribution

The novel contribution is an optimized spatiotemporal CNN-LSTM framework for robust and accurate video-based anomaly detection.

## Key findings

- The model achieves 98.1% accuracy on the DCSASS dataset with strong performance across cross-validation settings.
- The proposed method outperforms traditional ML models and recent deep learning baselines in precision, recall, and F1-score.
- Hyperparameter optimization and regularization strategies enhance convergence and generalization.

## Abstract

Anomaly detection in video surveillance remains a challenging problem due to complex human behaviors, temporal variability, and limited annotated data. This study proposes an optimized spatiotemporal deep learning (DL) framework that integrates a Convolutional Neural Network (CNN) for spatial feature extraction with a Long Short-Term Memory (LSTM) network for temporal dependency modeling. The CNN processes frame-level appearance information, while the LSTM captures sequential motion patterns across video frames, enabling effective representation of anomalous activities. Hyperparameter optimization and regularization strategies are employed to improve convergence stability and generalization performance. The proposed model is evaluated on the DCSASS surveillance dataset and the experimental results demonstrate that the optimized CNN-LSTM framework achieves an accuracy of 98.1%, with consistently high precision, recall, and F1-score across 3-fold, 5-fold, and 10-fold cross-validation settings. Comparative analysis shows that the proposed method outperforms conventional machine learning models and recent deep learning baselines, highlighting its effectiveness and robustness for practical video-based anomaly detection in surveillance environments.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12982034/full.md

---
Source: https://tomesphere.com/paper/PMC12982034