SenseShift6D: Multimodal RGB-D Benchmarking for Robust 6D Pose Estimation across Environment and Sensor Variations

Yegyu Han; Taegyoon Yoon; Dayeon Woo; Sojeong Kim; Hyung-Sin Kim

arXiv:2507.05751·cs.CV·March 20, 2026

SenseShift6D: Multimodal RGB-D Benchmarking for Robust 6D Pose Estimation across Environment and Sensor Variations

Yegyu Han, Taegyoon Yoon, Dayeon Woo, Sojeong Kim, Hyung-Sin Kim

PDF

Open Access 1 Repo 3 Reviews

TL;DR

SenseShift6D introduces a comprehensive RGB-D dataset capturing real-world environmental and sensor variations, revealing significant robustness challenges in current 6D pose estimation methods and emphasizing the need for sensor-aware solutions.

Contribution

This paper presents the first large-scale RGB-D dataset with diverse environmental and sensor variations, and demonstrates the sensitivity of existing pose estimators to these factors, highlighting new challenges and opportunities.

Findings

01

State-of-the-art estimators show performance drops across lighting and sensor changes.

02

Sensor- and environment-aware robustness is crucial for real-world 6D pose estimation.

03

Test-time multimodal sensor selection can significantly improve pose estimation accuracy.

Abstract

Recent advances on 6D object pose estimation have achieved high performance on representative benchmarks such as LM-O, YCB-V, and T-Less. However, these datasets were captured under fixed illumination and camera settings, leaving the impact of real-world variations in illumination, exposure, gain or depth-sensor mode largely unexplored. To bridge this gap, we introduce SenseShift6D, the first RGB-D dataset that physically sweeps 13 RGB exposures, 9 RGB gains, auto-exposure, 4 depth-capture modes, and 5 illumination levels. For six common household objects, we acquire 198.8k RGB and 20.0k depth images (i.e., 795.4k RGB-D scenes), providing 1,380 unique sensor-lighting permutations per object pose. Experiments with state-of-the-art pretrained, generalizable pose estimators reveal substantial performance variation across lighting and sensor settings, despite their large-scale pretraining.…

Peer Reviews

Decision·ICLR 2026 Conference Desk Rejected Submission

Reviewer 01Rating 6Confidence 3

Strengths

- The idea of incorporating several camera parameter variations in object pose dataset collection is nice. - This benchmark explores RGBD sensor parameter variation (especially photometric parameters) for 6D pose estimation. This is an interesting and important aspect of the object pose estimation problem. - The benchmark captures real camera effect under different parameter variation. This effect is not easily re-produced in synthetic data. This will be valuable for studying the problem. - The

Weaknesses

- The dataset is not marker-less. There are a lot of AR tags on the board beneath the object. It is OK since other datasets such as LineMOD also did this. But this dataset would still be far away from "data in the wild". - This dataset only contains 5 objects which is quite small in number compared to other related datasets. - It only contains single-object tabletop scenes without occlusion or multi-objects scenes. - Not sure if all RGB and depth sensors can control their modes to allow varying

Reviewer 02Rating 4Confidence 4

Strengths

1. The paper is well written and structured. 2. The paper is well-motivated and contributes meaningfully to the 6D pose estimation research. 3. The extensive evaluation of existing algorithms on the benchmark presents many valuable insights.

Weaknesses

1. The collected dataset has the ChArUco board presented in the images, also for the other well-known dataset. The question is whether the ChArUco board could potentially introduce bias to the pose estimation algorithms if the estimation algorithm is trained on a dataset that includes the ChArUco board. See more detailed discussion in [1]. I suggest removing the calibrator after calibration is done, as in the YCB dataset. 2. The number of objects and the background of the objects are limited. Th

Reviewer 03Rating 4Confidence 3

Strengths

- Introduction of an extensive 6D pose dataset that incorporates diverse illuminations, exposure, gain and depth levels. This enables benchmarking the robustness pose estimation models in challenging real-world conditions. - Comprehensive evaluation on different sensor configurations with pretrained models as well as instance-level pose estimation models. - The AUC with optimal sensor control is at least +15 points higher compared to the baseline auto-exposure sensor configurations, indicating t

Weaknesses

- There is no validation split, instead there is only a training and test split, which might lead to potential overfitting of the evaluated models. - The motivation is sensor-aware test-time adaptation, but the paper only shows the Oracle upper bound and not a single practical adaptation method. - Limited number of objects in the dataset (5 objects) may lead to non-generalizable conclusions.

Code & Models

Repositories

yegyu-han/senseshift6d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Robotics and Sensor-Based Localization