# Visual intelligence for efficient human action recognition in human computers interaction applications

**Authors:** Noorah Alghasham, Waleed Albattah

PMC · DOI: 10.1371/journal.pone.0343132 · 2026-03-05

## TL;DR

This paper introduces a deep learning model combining CNNs and RNNs for efficient and accurate human action recognition in human-computer interaction.

## Contribution

A novel HAR model using EfficientNetB7 and LSTM for high accuracy and low computational cost without data augmentation.

## Key findings

- The model achieved 97.8% accuracy on the UCF101 dataset.
- It outperformed existing models on the HMDB51 dataset with 80.1% accuracy.
- The model reduces computational complexity and avoids the need for data augmentation.

## Abstract

Human Action Recognition (HAR) is a pivotal area in computer vision, video surveillance, and human-computer interaction (HCI), driven by the need for efficient and accurate models to enhance HCI experiences. Traditional HAR methods often rely on hand-crafted features and shallow learning techniques, which limits their ability to capture complex patterns. In contrast, this study proposes an efficient HAR model that leverages deep neural networks, specifically a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), to enhance HCI through AI-powered action understanding. The model employs a pre-trained EfficientNetB7 network to extract rich spatial features from video frames, followed by a Long Short-Term Memory (LSTM) network to capture long-range temporal dependencies. This architecture enhances recognition accuracy while reducing computational complexity, making it highly suitable for HCI applications. Experimental results demonstrate the superior performance of the model, achieving a classification accuracy of 97.8% on the UCF101 dataset and 80.1% on the HMDB51 dataset, outperforming state-of-the-art HAR models. The proposed model eliminates the need for auxiliary assistive techniques like data augmentation, highlighting its efficiency and tremendous potential for real-world HCI applications that rely on accurate and efficient recognition of human actions.

## Full-text entities

- **Diseases:** HAR (MESH:D009207)
- **Chemicals:** RNN (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** LSTM — Homo sapiens (Human), Transformed cell line (CVCL_VJ00), UCF101 — Mus musculus (Mouse), Hybridoma (CVCL_J815)

## Figures

19 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12962486/full.md

---
Source: https://tomesphere.com/paper/PMC12962486