ORIDa: Object-centric Real-world Image Composition Dataset

Jinwoo Kim; Sangmin Han; Jinho Jeong; Jiwoo Choi; Dongyoung Kim; Seon Joo Kim

arXiv:2506.08964·cs.CV·June 11, 2025

ORIDa: Object-centric Real-world Image Composition Dataset

Jinwoo Kim, Sangmin Han, Jinho Jeong, Jiwoo Choi, Dongyoung Kim, Seon Joo Kim

PDF

Open Access

TL;DR

ORIDa is a large-scale, real-world image dataset designed to facilitate research in object compositing, featuring diverse scenes, objects, and both factual and counterfactual image sets to better mimic real-world scenarios.

Contribution

The paper introduces ORIDa, the first large-scale, real-captured dataset with diverse object placements and scene contexts for advancing object compositing research.

Findings

01

ORIDa contains over 30,000 images with 200 objects.

02

The dataset includes factual and counterfactual image sets.

03

Extensive analysis demonstrates ORIDa's utility for research.

Abstract

Object compositing, the task of placing and harmonizing objects in images of diverse visual scenes, has become an important task in computer vision with the rise of generative models. However, existing datasets lack the diversity and scale required to comprehensively explore real-world scenarios. We introduce ORIDa (Object-centric Real-world Image Composition Dataset), a large-scale, real-captured dataset containing over 30,000 images featuring 200 unique objects, each of which is presented across varied positions and scenes. ORIDa has two types of data: factual-counterfactual sets and factual-only scenes. The factual-counterfactual sets consist of four factual images showing an object in different positions within a scene and a single counterfactual (or background) image of the scene without the object, resulting in five images per scene. The factual-only scenes include a single image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection