# Football sports automatic judgment model based on improved YOLOv7 and RNN

**Authors:** Ting Wang, Xiao Yan, Jiawei Li, Xilong Luo

PMC · DOI: 10.1371/journal.pone.0334158 · PLOS One · 2025-11-05

## TL;DR

This paper introduces a new model for football sports video analysis that improves accuracy and efficiency using advanced deep learning techniques.

## Contribution

The novel contribution is combining an improved YOLOv7 with RNN optimization for enhanced sports video scene recognition.

## Key findings

- The model achieved a detection accuracy of 0.993 and a detection speed of 264.245 fps.
- It showed high performance with an intersection over union ratio of 0.885 and recall rate of 0.961 on the TrackingNet dataset.
- The model improved accuracy and fairness in football sports judgment through better scene extraction and semantic consistency.

## Abstract

The extraction, classification, and judgment of sports video scenes can improve work efficiency and accuracy. To understand sports videos in dynamic scenes, this study applies deep learning technology, firstly introducing clustering algorithm and attention mechanism to improve the target detection technology You Only Look Once v7, and identifying the targets existing in the scene. Then, the sparrow search algorithm in artificial intelligence algorithm is taken to optimize the parameter search of the recurrent neural network and automatically extract the target scene. After introducing three optimization strategies, the proposed model achieved a detection accuracy of 0.993 (as measured by classification accuracy), a floating-point calculation times of 244, and a detection speed of 264.245 fps. The average detection accuracy of this model was 0.95, and the loss function curve converged with the minimum number of iterations and convergence value. The maximum correlation accuracy was 0.958, and the detection accuracy was 0.926. Meanwhile, the model had the highest intersection over union ratio and recall rate on different datasets, reaching 0.885 and 0.961 respectively on the TrackingNet dataset. The improved scene extraction model had the smallest three error values, with the highest accuracy of 0.932, F1 of 0.955, and subject working characteristic curve area of 0.969. The R-squared value and semantic consistency of scene extraction perform well, improving the accuracy and fairness of football sports judgment. This study proposes an innovative solution to address sports video scene recognition, improving the accuracy of sports video scene recognition and bringing new effective technological means to the field of sports video analysis. Meanwhile, this study contributes to the rapid development of the sports industry and promotes the automation and popularization of football.

## Full-text entities

- **Genes:** CALM3 (calmodulin 3) [NCBI Gene 808] {aka CALM, CAM1, CAM2, CAMB, CPVT6, CaM}, CBS (cystathionine beta-synthase) [NCBI Gene 875] {aka HIP4}
- **Diseases:** PAN (MESH:D020914), CSP (MESH:C537866), ISTIN (MESH:D008569), RHLI (MESH:D009364), SSD (MESH:D012640), CBAM (MESH:D001289), BiLSTM (MESH:D000088562)
- **Chemicals:** FPN (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12588467/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12588467/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/PMC12588467/full.md

---
Source: https://tomesphere.com/paper/PMC12588467