DORAEMON: A Unified Library for Visual Object Modeling and Representation Learning at Scale

Ke Du; Yimin Peng; Chao Gao; Fan Zhou; Siqiao Xue

arXiv:2511.04394·cs.CV·November 7, 2025

DORAEMON: A Unified Library for Visual Object Modeling and Representation Learning at Scale

Ke Du, Yimin Peng, Chao Gao, Fan Zhou, Siqiao Xue

PDF

Open Access

TL;DR

DORAEMON is an open-source PyTorch library that unifies visual object modeling and representation learning, providing a scalable, reproducible, and versatile platform for research and deployment across various visual recognition tasks.

Contribution

It introduces a comprehensive, YAML-driven framework that consolidates datasets, models, and training techniques, enabling rapid experimentation and deployment in visual recognition.

Findings

01

Achieved state-of-the-art results on ImageNet-1K, MS-Celeb-1M, and Stanford online products.

02

Supports over 1000 pretrained backbones with modular components.

03

Facilitates easy export to ONNX and HuggingFace for deployment.

Abstract

DORAEMON is an open-source PyTorch library that unifies visual object modeling and representation learning across diverse scales. A single YAML-driven workflow covers classification, retrieval and metric learning; more than 1000 pretrained backbones are exposed through a timm-compatible interface, together with modular losses, augmentations and distributed-training utilities. Reproducible recipes match or exceed reference results on ImageNet-1K, MS-Celeb-1M and Stanford online products, while one-command export to ONNX or HuggingFace bridges research and deployment. By consolidating datasets, models, and training techniques into one platform, DORAEMON offers a scalable foundation for rapid experimentation in visual recognition and representation learning, enabling efficient transfer of research advances to real-world applications. The repository is available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Face Recognition and Perception · Domain Adaptation and Few-Shot Learning