Depth-Guided Self-Supervised Human Keypoint Detection via Cross-Modal Distillation

Aman Anand; Elyas Rashno; Amir Eskandari; Farhana Zulkernine

arXiv:2410.14700·cs.CV·August 14, 2025

Depth-Guided Self-Supervised Human Keypoint Detection via Cross-Modal Distillation

Aman Anand, Elyas Rashno, Amir Eskandari, Farhana Zulkernine

PDF

Open Access

TL;DR

This paper introduces Distill-DKP, a self-supervised framework that uses depth maps to improve human keypoint detection, significantly outperforming previous methods by leveraging cross-modal knowledge distillation.

Contribution

The paper presents a novel cross-modal knowledge distillation approach that incorporates depth information to enhance unsupervised human keypoint detection.

Findings

01

Reduces mean L2 error by 47.15% on Human3.6M

02

Improves keypoint accuracy by 1.3% on DeepFashion

03

Demonstrates effective knowledge transfer across network layers

Abstract

Existing unsupervised keypoint detection methods apply artificial deformations to images such as masking a significant portion of images and using reconstruction of original image as a learning objective to detect keypoints. However, this approach lacks depth information in the image and often detects keypoints on the background. To address this, we propose Distill-DKP, a novel cross-modal knowledge distillation framework that leverages depth maps and RGB images for keypoint detection in a self-supervised setting. During training, Distill-DKP extracts embedding-level knowledge from a depth-based teacher model to guide an image-based student model with inference restricted to the student. Experiments show that Distill-DKP significantly outperforms previous unsupervised methods by reducing mean L2 error by 47.15% on Human3.6M, mean average error by 5.67% on Taichi, and improving keypoints…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques

MethodsKnowledge Distillation