SPLIT: SE(3)-diffusion via Local Geometry-based Score Prediction for 3D   Scene-to-Pose-Set Matching Problems

Kanghyun Kim; Min Jun Kim

arXiv:2411.10049·cs.RO·November 18, 2024

SPLIT: SE(3)-diffusion via Local Geometry-based Score Prediction for 3D Scene-to-Pose-Set Matching Problems

Kanghyun Kim, Min Jun Kim

PDF

Open Access

TL;DR

This paper introduces SPLIT, an SE(3)-diffusion model that predicts local geometry-based scores to match 3D scenes with pose sets, enabling flexible robot manipulation without task-specific heuristics.

Contribution

The paper presents a novel SE(3)-diffusion approach for scene-to-pose matching that predicts local geometry scores, allowing multi-purpose pose generation within a single model.

Findings

01

Successfully matches scene to pose sets for various tasks.

02

Generates multiple relevant poses conditioned on the scene.

03

Achieves flexible, task-agnostic pose prediction.

Abstract

To enable versatile robot manipulation, robots must detect task-relevant poses for different purposes from raw scenes. Currently, many perception algorithms are designed for specific purposes, which limits the flexibility of the perception module. We present a general problem formulation called 3D scene-to-pose-set matching, which directly matches the corresponding poses from the scene without relying on task-specific heuristics. To address this, we introduce SPLIT, an SE(3)-diffusion model for generating pose samples from a scene. The model's efficiency comes from predicting scores based on local geometry with respect to the sample pose. Moreover, leveraging the conditioned generation capability of diffusion models, we demonstrate that SPLIT can generate the multi-purpose poses, required to complete both the mug reorientation and hanging manipulation within a single model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · 3D Shape Modeling and Analysis · Advanced Vision and Imaging