Frame Aggregation and Multi-Modal Fusion Framework for Video-Based   Person Recognition

Fangtao Li; Wenzhe Wang; Zihe Liu; Haoran Wang; Chenghao Yan; Bin Wu

arXiv:2010.09290·cs.CV·January 1, 2021

Frame Aggregation and Multi-Modal Fusion Framework for Video-Based Person Recognition

Fangtao Li, Wenzhe Wang, Zihe Liu, Haoran Wang, Chenghao Yan, Bin Wu

PDF

Open Access

TL;DR

This paper introduces a novel framework combining frame aggregation and multi-modal fusion to improve video-based person recognition, effectively handling occlusions, blurring, and angle variations.

Contribution

The paper proposes AttentionVLAD for adaptive frame aggregation and MLMA for multi-modal correlation learning, advancing video person recognition techniques.

Findings

01

Outperforms state-of-the-art methods on iQIYI-VID-2019 dataset

02

Effectively reduces impact of low-quality frames

03

Enhances multi-modal feature integration

Abstract

Video-based person recognition is challenging due to persons being blocked and blurred, and the variation of shooting angle. Previous research always focused on person recognition on still images, ignoring similarity and continuity between video frames. To tackle the challenges above, we propose a novel Frame Aggregation and Multi-Modal Fusion (FAMF) framework for video-based person recognition, which aggregates face features and incorporates them with multi-modal information to identify persons in videos. For frame aggregation, we propose a novel trainable layer based on NetVLAD (named AttentionVLAD), which takes arbitrary number of features as input and computes a fixed-length aggregation feature based on feature quality. We show that introducing an attention mechanism to NetVLAD can effectively decrease the impact of low-quality frames. For the multi-model information of videos, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Gait Recognition and Analysis