Scene-aware Human Pose Generation using Transformer

Jieteng Yao; Junjie Chen; Li Niu; Bin Sheng

arXiv:2308.02177·cs.CV·August 7, 2023·1 cites

Scene-aware Human Pose Generation using Transformer

Jieteng Yao, Junjie Chen, Li Niu, Bin Sheng

PDF

Open Access

TL;DR

This paper introduces a transformer-based, template-driven approach for scene-aware human pose generation, leveraging pose templates and knowledge distillation to improve prediction accuracy in scene understanding tasks.

Contribution

The method innovatively combines pose templates with transformer interactions and knowledge distillation for improved human pose generation in scenes.

Findings

01

Effective in generating realistic human poses in scenes

02

Outperforms existing template-based methods

03

Validated on Sitcom dataset

Abstract

Affordance learning considers the interaction opportunities for an actor in the scene and thus has wide application in scene understanding and intelligent robotics. In this paper, we focus on contextual affordance learning, i.e., using affordance as context to generate a reasonable human pose in a scene. Existing scene-aware human pose generation methods could be divided into two categories depending on whether using pose templates. Our proposed method belongs to the template-based category, which benefits from the representative pose templates. Moreover, inspired by recent transformer-based methods, we associate each query embedding with a pose template, and use the interaction between query embeddings and scene feature map to effectively predict the scale and offsets for each pose template. In addition, we employ knowledge distillation to facilitate the offset learning given the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Advanced Vision and Imaging

MethodsKnowledge Distillation · Focus