Organizing Multimedia Data in Video Surveillance Systems Based on Face Verification with Convolutional Neural Networks
Anastasiia D. Sokolova, Angelina S. Kharchevnikova, Andrey V., Savchenko

TL;DR
This paper presents a two-stage method for organizing video surveillance footage by detecting faces, tracking them across frames, and grouping similar faces using deep neural networks, with gender and age estimation for enhanced usability.
Contribution
The paper introduces a novel two-stage approach combining face detection, tracking, and clustering with deep CNN features for improved organization of surveillance videos.
Findings
Deep CNN features improve face verification accuracy.
Normalized average feature vectors yield the best matching results.
The method is effective on YTF and IJB-A datasets.
Abstract
In this paper we propose the two-stage approach of organizing information in video surveillance systems. At first, the faces are detected in each frame and a video stream is split into sequences of frames with face region of one person. Secondly, these sequences (tracks) that contain identical faces are grouped using face verification algorithms and hierarchical agglomerative clustering. Gender and age are estimated for each cluster (person) in order to facilitate the usage of the organized video collection. The particular attention is focused on the aggregation of features extracted from each frame with the deep convolutional neural networks. The experimental results of the proposed approach using YTF and IJB-A datasets demonstrated that the most accurate and fast solution is achieved for matching of normalized average of feature vectors of all frames in a track.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
