# AU R-CNN: Encoding Expert Prior Knowledge into R-CNN for Action Unit   Detection

**Authors:** Chen Ma, Li Chen, Junhai Yong

arXiv: 1812.05788 · 2019-08-27

## TL;DR

This paper introduces AU R-CNN, a novel model that incorporates expert prior knowledge into region definitions for improved facial action unit detection, achieving state-of-the-art results using only static images.

## Contribution

The paper proposes AU R-CNN, which encodes expert prior knowledge into region and label definitions, and demonstrates its superiority over existing methods and dynamic models.

## Key findings

- AU R-CNN outperforms existing approaches on BP4D and DISFA datasets.
- Static RGB image-based AU R-CNN surpasses models with dynamic information.
- AU R-CNN achieves state-of-the-art AU detection performance.

## Abstract

Detecting action units (AUs) on human faces is challenging because various AUs make subtle facial appearance change over various regions at different scales. Current works have attempted to recognize AUs by emphasizing important regions. However, the incorporation of expert prior knowledge into region definition remains under-exploited, and current AU detection approaches do not use regional convolutional neural networks (R-CNN) with expert prior knowledge to directly focus on AU-related regions adaptively. By incorporating expert prior knowledge, we propose a novel R-CNN based model named AU R-CNN. The proposed solution offers two main contributions: (1) AU R-CNN directly observes different facial regions, where various AUs are located. Specifically, we define an AU partition rule which encodes the expert prior knowledge into the region definition and RoI-level label definition. This design produces considerably better detection performance than existing approaches. (2) We integrate various dynamic models (including convolutional long short-term memory, two stream network, conditional random field, and temporal action localization network) into AU R-CNN and then investigate and analyze the reason behind the performance of dynamic models. Experiment results demonstrate that \textit{only} static RGB image information and no optical flow-based AU R-CNN surpasses the one fused with dynamic models. AU R-CNN is also superior to traditional CNNs that use the same backbone on varying image resolutions. State-of-the-art recognition performance of AU detection is achieved. The complete network is end-to-end trainable. Experiments on BP4D and DISFA datasets show the effectiveness of our approach. The implementation code is available online.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.05788/full.md

## Figures

35 figures with captions in the complete paper: https://tomesphere.com/paper/1812.05788/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/1812.05788/full.md

---
Source: https://tomesphere.com/paper/1812.05788