Online Multi-modal Person Search in Videos
Jiangyue Xia, Anyi Rao, Qingqiu Huang, Linning Xu, Jiangtao Wen, Dahua, Lin

TL;DR
This paper introduces an online multi-modal person search framework that recognizes individuals in videos in real-time by dynamically updating a multimodal memory bank using reinforcement learning, outperforming existing offline and online methods.
Contribution
The paper presents a novel online person search method with a dynamic multimodal memory bank and reinforcement learning-based update policy, enabling real-time recognition in videos.
Findings
Effective in real-time person recognition in videos
Outperforms existing online and offline methods
Demonstrated on a large movie dataset
Abstract
The task of searching certain people in videos has seen increasing potential in real-world applications, such as video organization and editing. Most existing approaches are devised to work in an offline manner, where identities can only be inferred after an entire video is examined. This working manner precludes such methods from being applied to online services or those applications that require real-time responses. In this paper, we propose an online person search framework, which can recognize people in a video on the fly. This framework maintains a multimodal memory bank at its heart as the basis for person recognition, and updates it dynamically with a policy obtained by reinforcement learning. Our experiments on a large movie dataset show that the proposed method is effective, not only achieving remarkable improvements over online schemes but also outperforming offline methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Face recognition and analysis
