Leveraging Human Revisions for Improving Text-to-Layout Models

Amber Xie; Chin-Yi Cheng; Forrest Huang; Yang Li

arXiv:2405.13026·cs.CL·May 24, 2024

Leveraging Human Revisions for Improving Text-to-Layout Models

Amber Xie, Chin-Yi Cheng, Forrest Huang, Yang Li

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a method that uses detailed human revisions as feedback to improve text-to-layout generative models, resulting in more designer-aligned outputs by training a reward model with reinforcement learning.

Contribution

It proposes a novel approach to incorporate nuanced human feedback through revisions into the training of layout generation models, enhancing alignment with designer preferences.

Findings

01

Generated layouts are more aligned with modern design standards.

02

The reward model effectively captures human revision patterns.

03

Reinforcement learning from human feedback improves model outputs.

Abstract

Learning from human feedback has shown success in aligning large, pretrained models with human values. Prior works have mostly focused on learning from high-level labels, such as preferences between pairs of model outputs. On the other hand, many domains could benefit from more involved, detailed feedback, such as revisions, explanations, and reasoning of human users. Our work proposes using nuanced feedback through the form of human revisions for stronger alignment. In this paper, we ask expert designers to fix layouts generated from a generative layout model that is pretrained on a large-scale dataset of mobile screens. Then, we train a reward model based on how human designers revise these generated layouts. With the learned reward model, we optimize our model with reinforcement learning from human feedback (RLHF). Our method, Revision-Aware Reward Models ( $\method$ ), allows a…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

1. This paper proposes a novel approach to integrate different human feedbacks into model training, i.e., the step-by-step revision sequences. 2. The reward is designed to correlate with revision time, which provides better signals than binary comparison rewards.

Weaknesses

1. Though the paper presents a new notion of human feedback, i.e., revision sequences, its application to layout generation makes its applicability quite constrained. The first time I read the abstract, I thought the paper seemed to propose a general methodology for RLHF. After going through the paper, I realized that the proposed reward training is only specifically designed for the text-to-layout generation domain. 2. The evaluation is not sound to me. In section, the major quantitative evalu

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

The paper investigates using more nuanced feedback rather than binary preference, which is a less explored area of research.

Weaknesses

The papers outline is not in a typical format, the dataset description comes later, I struggled to imaging the dataset while reading the experiments section without reading the dataset description before. The equations used in the paper aren't fully explained and in some places the symbols used in the equation and the description are inconsistent. I did not get a full understanding of the background reading the paper because of this. Maybe the authors can reduce the size of the figures or move

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

- The paper collects high-quality datasets from expert app layout designers. The dataset could be important for the community. - The method is simple but effective. - The proposed method is much more effective than simple finetuning.

Weaknesses

- The novelty might be limited. It seems that the novel part of the method is how training samples are constructed from the collected dataset to train RARE. Other parts like the diffusion models and RLHF are similar to existing work. - I am not quite convinced by sec. 4.2 on the reward model pretraining. The construction of pretraining data seems a bit too heuristic and are not grounded on any reasonable arguments/observations. Why do you assume dropping needs 1 time step, revised elements need

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Topic Modeling · Human Motion and Animation