Dual Reinforcement-Based Specification Generation for Image De-Rendering

Ramakanth Pasunuru; David Rosenberg; Gideon Mann; Mohit Bansal

arXiv:2103.01867·cs.CL·March 3, 2021

Dual Reinforcement-Based Specification Generation for Image De-Rendering

Ramakanth Pasunuru, David Rosenberg, Gideon Mann, Mohit Bansal

PDF

Open Access

TL;DR

This paper compares LSTM and Transformer decoders for graphics program inference from images, revealing Transformer robustness to sequence order and introducing reinforcement learning with multiple rewards to improve decoding quality, achieving state-of-the-art results.

Contribution

It introduces a reinforcement learning approach with multiple rewards to enhance decoder inductive bias and compares sequence models for graphics program inference.

Findings

01

Transformers are less sensitive to sequence ordering than LSTMs.

02

Reinforcement learning with multiple rewards improves graphics program generation.

03

Achieved state-of-the-art results on two datasets.

Abstract

Advances in deep learning have led to promising progress in inferring graphics programs by de-rendering computer-generated images. However, current methods do not explore which decoding methods lead to better inductive bias for inferring graphics programs. In our work, we first explore the effectiveness of LSTM-RNN versus Transformer networks as decoders for order-independent graphics programs. Since these are sequence models, we must choose an ordering of the objects in the graphics programs for likelihood training. We found that the LSTM performance was highly sensitive to the sequence ordering (random order vs. pattern-based order), while Transformer performance was roughly independent of the sequence ordering. Further, we present a policy gradient based reinforcement learning approach for better inductive bias in the decoder via multiple diverse rewards based both on the graphics…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Video Analysis and Summarization

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Sigmoid Activation · Byte Pair Encoding · Layer Normalization · Residual Connection · Adam · Dropout · Tanh Activation