# Batch Recurrent Q-Learning for Backchannel Generation Towards Engaging   Agents

**Authors:** Nusrah Hussain, Engin Erzin, T. Metin Sezgin, and Yucel Yemez

arXiv: 1908.02037 · 2019-08-07

## TL;DR

This paper introduces a batch reinforcement learning method with recurrent neural networks for generating engaging backchannels in human-robot interaction, improving engagement levels over imitation learning.

## Contribution

It presents a novel batch RL approach with recurrent layers for backchannel generation, leveraging human-human interaction data for more engaging robot responses.

## Key findings

- Recurrent layers improve performance in partially observable environments.
- RL agents generate more engaging backchannels than imitation learning.
- The method effectively utilizes recorded datasets for training.

## Abstract

The ability to generate appropriate verbal and non-verbal backchannels by an agent during human-robot interaction greatly enhances the interaction experience. Backchannels are particularly important in applications like tutoring and counseling, which require constant attention and engagement of the user. We present here a method for training a robot for backchannel generation during a human-robot interaction within the reinforcement learning (RL) framework, with the goal of maintaining high engagement level. Since online learning by interaction with a human is highly time-consuming and impractical, we take advantage of the recorded human-to-human dataset and approach our problem as a batch reinforcement learning problem. The dataset is utilized as a batch data acquired by some behavior policy. We perform experiments with laughs as a backchannel and train an agent with value-based techniques. In particular, we demonstrate the effectiveness of recurrent layers in the approximate value function for this problem, that boosts the performance in partially observable environments. With off-policy policy evaluation, it is shown that the RL agents are expected to produce more engagement than an agent trained from imitation learning.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.02037/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1908.02037/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1908.02037/full.md

---
Source: https://tomesphere.com/paper/1908.02037