Text2Action: Generative Adversarial Synthesis from Language to Action

Hyemin Ahn; Timothy Ha; Yunho Choi; Hwiyeon Yoo; and Songhwai Oh

arXiv:1710.05298·cs.LG·October 25, 2017

Text2Action: Generative Adversarial Synthesis from Language to Action

Hyemin Ahn, Timothy Ha, Yunho Choi, Hwiyeon Yoo, and Songhwai Oh

PDF

1 Repo

TL;DR

This paper introduces a GAN-based model that translates natural language descriptions into human action sequences, enabling robots or virtual agents to perform actions aligned with textual input.

Contribution

It presents a novel sequence-to-sequence GAN framework trained on large-scale video data to generate diverse, human-like actions from language descriptions.

Findings

01

Successfully generates human-like actions from text

02

Transfers actions to a Baxter robot for real-world execution

03

Models the language-action relationship accurately

Abstract

In this paper, we propose a generative model which learns the relationship between language and human action in order to generate a human action sequence given a sentence describing human behavior. The proposed generative model is a generative adversarial network (GAN), which is based on the sequence to sequence (SEQ2SEQ) model. Using the proposed generative network, we can synthesize various actions for a robot or a virtual agent using a text encoder recurrent neural network (RNN) and an action decoder RNN. The proposed generative network is trained from 29,770 pairs of actions and sentence annotations extracted from MSR-Video-to-Text (MSR-VTT), a large-scale video dataset. We demonstrate that the network can generate human-like actions which can be transferred to a Baxter robot, such that the robot performs an action based on a provided sentence. Results show that the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hiddenmaze/text2action
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.