# Face-Focused Cross-Stream Network for Deception Detection in Videos

**Authors:** Mingyu Ding, An Zhao, Zhiwu Lu, Tao Xiang, and Ji-Rong Wen

arXiv: 1812.04429 · 2018-12-12

## TL;DR

This paper introduces a novel face-focused cross-stream network for deception detection in videos, effectively combining face and body cues and employing meta and adversarial learning to overcome limited training data, achieving state-of-the-art results.

## Contribution

The paper proposes a new face-focused cross-stream network that explicitly incorporates facial expressions and joint feature learning, along with a robust training strategy for limited data scenarios.

## Key findings

- Achieves state-of-the-art deception detection accuracy.
- Effective fusion of face and body cues improves performance.
- Robust training strategy enhances model generalization.

## Abstract

Automated deception detection (ADD) from real-life videos is a challenging task. It specifically needs to address two problems: (1) Both face and body contain useful cues regarding whether a subject is deceptive. How to effectively fuse the two is thus key to the effectiveness of an ADD model. (2) Real-life deceptive samples are hard to collect; learning with limited training data thus challenges most deep learning based ADD models. In this work, both problems are addressed. Specifically, for face-body multimodal learning, a novel face-focused cross-stream network (FFCSN) is proposed. It differs significantly from the popular two-stream networks in that: (a) face detection is added into the spatial stream to capture the facial expressions explicitly, and (b) correlation learning is performed across the spatial and temporal streams for joint deep feature learning across both face and body. To address the training data scarcity problem, our FFCSN model is trained with both meta learning and adversarial learning. Extensive experiments show that our FFCSN model achieves state-of-the-art results. Further, the proposed FFCSN model as well as its robust training strategy are shown to be generally applicable to other human-centric video analysis tasks such as emotion recognition from user-generated videos.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.04429/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1812.04429/full.md

## References

52 references — full list in the complete paper: https://tomesphere.com/paper/1812.04429/full.md

---
Source: https://tomesphere.com/paper/1812.04429