Scene-aware Human Pose Generation using Transformer
Jieteng Yao, Junjie Chen, Li Niu, Bin Sheng

TL;DR
This paper introduces a transformer-based, template-driven approach for scene-aware human pose generation, leveraging pose templates and knowledge distillation to improve prediction accuracy in scene understanding tasks.
Contribution
The method innovatively combines pose templates with transformer interactions and knowledge distillation for improved human pose generation in scenes.
Findings
Effective in generating realistic human poses in scenes
Outperforms existing template-based methods
Validated on Sitcom dataset
Abstract
Affordance learning considers the interaction opportunities for an actor in the scene and thus has wide application in scene understanding and intelligent robotics. In this paper, we focus on contextual affordance learning, i.e., using affordance as context to generate a reasonable human pose in a scene. Existing scene-aware human pose generation methods could be divided into two categories depending on whether using pose templates. Our proposed method belongs to the template-based category, which benefits from the representative pose templates. Moreover, inspired by recent transformer-based methods, we associate each query embedding with a pose template, and use the interaction between query embeddings and scene feature map to effectively predict the scale and offsets for each pose template. In addition, we employ knowledge distillation to facilitate the offset learning given the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Advanced Vision and Imaging
MethodsKnowledge Distillation · Focus
