Human Activity Recognition Using Visual Object Detection

Schalk Wilhelm Pienaar; Reza Malekian

arXiv:1905.03707·cs.CV·May 10, 2019

Human Activity Recognition Using Visual Object Detection

Schalk Wilhelm Pienaar, Reza Malekian

PDF

TL;DR

This paper explores using visual object detection with SSD and data fusion techniques to improve human activity recognition in underground mining environments, aiming for accurate monitoring of miner states.

Contribution

It applies SSD trained on COCO for miner state detection and proposes data fusion methods to enhance activity recognition accuracy in mining settings.

Findings

01

SSD effectively detects miner states in complex environments

02

Data fusion improves activity recognition accuracy

03

The approach balances performance and development speed

Abstract

Visual Human Activity Recognition (HAR) and data fusion with other sensors can help us at tracking the behavior and activity of underground miners with little obstruction. Existing models, such as Single Shot Detector (SSD), trained on the Common Objects in Context (COCO) dataset is used in this paper to detect the current state of a miner, such as an injured miner vs a non-injured miner. Tensorflow is used for the abstraction layer of implementing machine learning algorithms, and although it uses Python to deal with nodes and tensors, the actual algorithms run on C++ libraries, providing a good balance between performance and speed of development. The paper further discusses evaluation methods for determining the accuracy of the machine-learning and an approach to increase the accuracy of the detected activity/state of people in a mining environment, by means of data fusion.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings