# Through-Wall Pose Imaging in Real-Time with a Many-to-Many   Encoder/Decoder Paradigm

**Authors:** Kevin Meng, Yu Meng

arXiv: 1904.00739 · 2019-10-22

## TL;DR

This paper presents a deep learning approach for real-time, through-wall human pose imaging using RF signals, achieving accurate skeleton reconstruction without visual line-of-sight, through a novel many-to-many encoder/decoder framework.

## Contribution

It introduces a new many-to-many imaging paradigm, integrating RPN and LSTM networks, with an original training pipeline for RF-based pose estimation behind occlusions.

## Key findings

- Accurately predicts human skeletons behind visual obstructions using RF signals.
- Develops a novel deep learning model combining CNN, RPN, and LSTM.
- Demonstrates real-time performance in through-wall pose imaging.

## Abstract

Overcoming the visual barrier and developing "see-through vision" has been one of mankind's long-standing dreams. Unlike visible light, Radio Frequency (RF) signals penetrate opaque obstructions and reflect highly off humans. This paper establishes a deep-learning model that can be trained to reconstruct continuous video of a 15-point human skeleton even through visual occlusion. The training process adopts a student/teacher learning procedure inspired by the Feynman learning technique, in which video frames and RF data are first collected simultaneously using a co-located setup containing an optical camera and an RF antenna array transceiver. Next, the video frames are processed with a computer-vision-based gait analysis "teacher" module to generate ground-truth human skeletons for each frame. Then, the same type of skeleton is predicted from corresponding RF data using a "student" deep-learning model consisting of a Residual Convolutional Neural Network (CNN), Region Proposal Network (RPN), and Recurrent Neural Network with Long-Short Term Memory (LSTM) that 1) extracts spatial features from RF images, 2) detects all people present in a scene, and 3) aggregates information over many time-steps, respectively. The model is shown to both accurately and completely predict the pose of humans behind visual obstruction solely using RF signals. Primary academic contributions include the novel many-to-many imaging methodology, unique integration of RPN and LSTM networks, and original training pipeline.

---
Source: https://tomesphere.com/paper/1904.00739