Natural Language Person Search Using Deep Reinforcement Learning
Ankit Shah, Tyler Vuong

TL;DR
This paper proposes a deep reinforcement learning approach for natural language person search that localizes a person in images by iteratively refining bounding boxes based on description and pixel data, aiming for improved efficiency and accuracy.
Contribution
It introduces a constrained deep reinforcement learning method specifically designed for person search, focusing on bounding box refinement guided by natural language descriptions.
Findings
Effective localization of persons using RL-based bounding box adjustments
Reduced computational resources compared to unconstrained object detection
Improved accuracy in person search tasks
Abstract
Recent success in deep reinforcement learning is having an agent learn how to play Go and beat the world champion without any prior knowledge of the game. In that task, the agent has to make a decision on what action to take based on the positions of the pieces. Person Search is recently explored using natural language based text description of images for video surveillance applications (S.Li et.al). We see (Fu.et al) provides an end to end approach for object-based retrieval using deep reinforcement learning without constraints placed on which objects are being detected. However, we believe for real-world applications such as person search defining specific constraints which identify a person as opposed to starting with a general object detection will have benefits in terms of performance and computational resources required. In our task, Deep reinforcement learning would localize the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition
